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Apparatus and Method for Rectangular-to-Polar Conversion 



Inventors: 

Dengwei Fu 
Arthur Torosyan 
Alan Willson 



This application claims the benefit of U.S. Provisional Application No: 
60/162,391, filed on October, 29, 1999, which is incorporated herein by reference. 

This invention was made with Government support under grant no. MIP 
9632698 awarded by the National Science Foundation. The U.S. Government has 
1 0 certain rights in this invention. 

Cross-Reference to Other Applications 

The following applications of common assignee are related to the present 
application, have the same filing date as the present application, and are herein 
1 5 incorporated by reference in their entireties: 

"Apparatus and Method for Trigonometric Interpolation," Attorney 
Docket No. 1904.0140001; and 

Apparatus and Method for Angle Rotation," Attorney Docket No. 
1904.0140002. 

20 Background of the Invention 

Field of the Invention 

The present invention is related to digital signal processing and digital 
communications. More specifically the present invention is related to interpolation, 
25 angle rotation, rectangular-to-polar conversion, and carrier and symbol timing 

recovery for digital processing and digital communications applications. 
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Related Art 

Advances in technology have enabled high-quality, low-cost 
communications with global coverage, and provide the possibility for fast Internet 
access and multimedia to be added to existing services. Exemplary emerging 
5 technologies include cellular mobile radio and digital video broadcasting, both of 

which are described briefly as follows. 

In recent years, cellular mobile radio has experienced rapid growth due to 
the desire for mobility while enjoying the two-way voice services it provides. 
GSM, IS- 136 and personal digital cellular (PDC) are among the most successful 

10 second-generation personal communications (PCS) technologies in the world 

today, and are responsible for providing cellular and PCS services globally. As the 
technology advances, customers will certainly demand more from their wireless 
services. For example, with the explosive growth of the world wide web over the 
wired networks, it is desirable to provide Internet services over mobile radio 

1 5 networks. One effort to specify the future global wireless access system is known 

as IMT-2000 (Buchanan, K., etal, IEEE Per s. Comm. 4:8-13 (1997)). The goal 
of IMT-2000 is to provide not only traditional mobile voice communications, but 
also a variety of voice and data services with a wide range of applications such as 
multimedia capabilities, Internet access, imaging and video conferencing. It is also 

20 an aim to unify many existing diverse systems (paging, cordless, cellular, mobile 

satellite, etc.) into a seamless radio structure offering a wide range of services. 
Another principle is to integrate mobile and fixed networks in order to provide 
fixed network services over the wireless infrastructure. Such systems might well 
utilize broadband transport technologies such as asynchronous transfer mode 

25 (ATM). 

For the applications of IMT-2000, a high-bit-rate service is needed. 
Moreover, for multimedia applications, the system should provide a multitude of 
services each requiring 1) a different rate, and 2) a different quality-of-service 
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parameter. Thus, a flexible, variable-rate access with data rates approaching 2Mb/s 
is proposed for IMT-2000. 

The advent of digital television systems has transformed the classical TV 
channel into a fast and reliable data transmission medium. According to the 
specifications of the DVB project (Reimers, U., IEEE Comm. Magazine 36: 104- 
1 10 (1998)), digital TV is no longer restricted to transmitting sound and images 
but instead has become a data broadcasting mechanism which is fully transparent 
to all contents. Digital TV broadcasting by satellite, cable and terrestrial networks 
is currently under intensive development. A typical system looks like this: a DVB 
signal is received from a satellite dish, from cable, or from an antenna (terrestrial 
reception). A modem built into an integrated receiver/decoder (IRD) will 
demodulate and decode the signal The information received will be displayed on 
a digital TV or a multimedia PC. In addition to being used as a digital TV, DVB 
can receive data streams from companies who wish to transmit large amounts of 
data to many reception sites. These organizations may be banks, chains of retail 
stores, or information brokers who wish to offer access to selected Internet sites 
at high data rates. One such system is MultiMedia Mobile (M 3 ), which has a data 
rate of 16Mb/s. 

For proper operation, these third generation systems require proper 
synchronization between the transmitter and the receiver. More specifically, the 
frequency and phase of the receiver local oscillator should substantially match that 
of the transmitter local oscillator. When there is a mismatch, then an undesirable 
rotation of the symbol constellation will occur at the receiver, which will seriously 
degrade system performance. When the carrier frequency offset is much smaller 
than the symbol rate, the phase and frequency mismatches can be corrected at 
baseband by using a phase rotator. It is also necessary to synchronize the sampling 
clock such that it extracts symbols at the correct times. This can be achieved 
digitally by preforming appropriate digital resamples. 

The digital resampler and the direct digital frequency synthesizer (DDS) 
used by the phase rotator are among the most complex components in a receiver 

1904.0140003 
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(Cho, K., "A frequency-agile single-chip QAM modulator with beamforming 
diversity," Ph.D. dissertation, University of California, Los Angeles ( 1 999)). Their 
performance is significant in the overall design of a communications modem. For 
multimedia communications, the high-data-rate requirement would impose a 
5 demand for high computational power. However, for mobile personal 

communication systems, low cost, small size and long battery life are desirable. 
Therefore, it would be desirable to have an efficient implementation of the phase 
rotator, re-sampler, and DDS in order to perform fast signal processing that 
operates within the available resources. Furthermore, it would be desirable to have 

10 an efficient synchronization mechanism that uses a unified approach to timing and 

carrier phase corrections. 

For Internet services it is important to provide instantaneous throughput 
intermittently. Packet data systems allow the multiplexing of a number of users on 
a single channel, providing access to users only when they need it. This way the 

15 service can be made more cost-effective. However, the user data content of such 

a transmission is usually very short. Therefore, it is essential to acquire the 
synchronization parameters rapidly from the observation of a short signal-segment. 

For applications where low power and low complexity are the major 
requirements, such as in personal communications, it is desirable to sample the 

20 signal at the lowest possible rate, and to have a synchronizer that is as simple as 

possible. Therefore, it is also desirable to have an efficient synchronizer 
architecture that achieves these goals. 

For applications utilizing Orthogonal Frequency Division Multiplexing 
(OFDM), sampling phase shift error produces a rotation of the Fast Fourier 

25 Transform (FFT) outputs (Pollet T., and Peters, M., IEEE Comm. Magazine 

37:80-86 (1999)). A phase correction can be achieved at the receiver by rotating 
the FFT outputs. Therefore, it is also desirable to have an efficient implementation 
structure to perform rotations of complex numbers. 
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Summary of the Invention 

The present invention is directed at a rectangular-to-polar-converter that 
receives a complex input signal (having Xq and Y 0 components) and determines 
an angle (J>, which represents the position of the complex input signal in the 
5 complex plane. In doing so, the rectangular-to-polar converter determines a coarse 

angle <p x and a fine angle <p 2 , where (p= cp : + <p 2 . 

The coarse angle (p x is obtained using a small arctangent table and a 
reciprocal table. These tables provide just enough precision such that the 
remaining fine angle <p 2 is small enough to approximately equal its tangent value. 
10 Therefore the fine angle <p 2 can be obtained without a look-up table, and the fine 

angle computations are consolidated into a few small multipliers, given a precision 
requirement. 

More specifically, the coarse angle computation is performed by retrieving 
a pre-computed Z 0 = 1/[X<)] value from a reciprocal lookup table (e.g. memory 

15 device) , where [XJ is a bit truncated approximation of Xq. The Z 0 value is 

multiplied by the Y 0 component, resulting in a [YqZq] value. The coarse 
approximation angle <p x is retrieved from a second lookup table that stores pre- 
computed arctan values of [YoZ 0 ]. Next, the input complex signal is multiplied by 
the [YoZ 0 ] value. This multiplication effectively rotates the input complex number 

20 by the coarse angle (p x back toward the X-axis of the complex plane, resulting in 

an intermediate complex number having an X x component and a Y x component. 
Next the reciprocal lookup table is re-used to determine an approximation of Z x 
^1/tXJ. Then the tangent of the fine angle q> 2 is determined based on [Z^], 
assuming that tan <p 2 can be substantially approximated as [ZxYJ. In 

25 embodiments, the Newton Raphson method is implemented to get a more accurate 

tan <p 2 result. Finally, based on the smallness of tan (p 2 , the trigonometric function 
value tan <p 2 is used as an approximation to <p 2 hence requiring no arctan table. 

Applications of the rectangular-to-polar converter include symbol and 
carrier synchronization, including symbol synchronization for bursty transmissions 
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of packet data systems. Other applications include any application requiring the 
rectangular-to-polar conversion of a complex input signal. 

Further features and advantages of the invention, as well as the structure 
and operation of various embodiments of the invention, are described in detail 
5 below with reference to the accompanying drawings. The drawing in which an 

element first appears is typically indicated by the leftmost character(s) and/or 
digit(s) in the corresponding reference number. 

Brief Description of the Figures 

FIG. 1 A illustrates a PSK transmitter. 
10 FIG. IB illustrates a PSK receiver. 

FIG. 1C illustrates a block diagram of an OFDM system. 
FIG. ID illustrates a PSK receiver with carrier and timing recovery. 
FIG. 2 illustrates an Interpolation environment. 
FIG. 3 illustrates a Lagrange basis polynomials. 
15 FIG. 4 illustrates a Farrow structure that implements (2.5) and (2.6). 

FIG. 5 illustrates a flowchart 500 representing trigonometric interpolation 
according to embodiments of the present invention. 

FIG. 6A. illustrates an impulse response of a Lagrange interpolator. 
FIG. 6B illustrates an impulse response of a Trigonometric interpolator 
20 according to embodiments of the present invention. 

FIG. 7 A illustrates a frequency response for N=4 according to 
embodiments of the present invention. 

FIG.7B illustrates a frequency response for N=32 according to 
embodiments of the present invention. 
25 FIG. 8 A illustrates a signal with two samples/symbol and 1 00% excess B W 

according to embodiments of the present invention. 

FIG. 8B illustrates an NMSE of the interpolated signal according to 
embodiments of the present invention. 

1904.0140003 



FIG. 9 illustrates the critical path of the Lagrange cubic interpolator. 

FIG. 10 illustrates a trigonometric interpolator with N=4 according to 
embodiments of the present invention. 

FIG. 1 1 illustrates a trigonometric interpolator with N=8 according to 
embodiments of the present invention. 

FIG. 12 illustrates a conceptual modification of input samples according 
to embodiments of the present invention. 

FIG. 13 illustrates correcting the offset due to modification of original 
samples according to embodiments of the present invention. 

FIG, 14 illustrates the modified trigonometric interpolator for N=4 
according to embodiments of the present invention. 

FIG. 15 illustrates the modified trigonometric interpolator for N=S 
according to embodiments of the present invention. 

FIG. 16A-D illustrates a comparision of the amount of interpolation error 
using (A) Language cubic, (B) the trigonometric interpolator 1000, (C) the 
trigonometric interpolator, (D) the optimal structure (to be discussed in 
Section 4). 

FIG. 17 illustrates a flowchart 1700 representing trigonometric 
interpolation according to embodiments of the present invention. 

FIG. 18 illustrates trigonometric interpolation using a table lookup for 
angle rotation according to embodiments of the present invention. 

FIG. 19 illustrates trigonometric interpolation using modified samples 
according to embodiments of the present invention. 

FIG. 20 illustrates normalized impulse responses / of the interpolation 
filters according to embodiments of the present invention. 

FIG. 21 illustrates normalized frequency responses F of the interpolation 
filters according to embodiments of the present invention. 

FIG. 22 illustrates analysis of the frequency responses according to 
embodiments of the present invention. 
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FIG. 23 illustrates the effect of a more gradual transition at the band edge 
according to embodiments of the present invention. 

FIG. 24 illustrates reducing the transition bandwidth by increasing N 
according to embodiments of the present invention. 
5 FIGs. 25 A-B illustrate (A) impulse response of the original filter and the 

modified filter; (B) The equivalent window, according to embodiments of the 
present invention. 

FIG. 26 illustrates forming the frequency response of the discrete-time 
fractional-delay filter according to embodiments of the present invention. 
10 FIGs. 27 A-B illustrate a fractional-delay filter with (A) ^=0.12 and 

(B) ji=0.5, using the preliminary N=% interpolator according to embodiments of 
the present invention. 

FIGs. 28A-D illustrate modification toF(Q), and the corresponding F^co) 
according to embodiments of the present invention. 
15 FIGs. 29A-B illustrate F M («), with fi=0.5, N=%, (A) before and (B) after 

optimization according to embodiments of the present invention. 

FIGs. 30A-30B illustrate F^(g>) for ^=0.5, N=4, (A) before and (B) after 
modification according to embodiments of the present invention. 

FIGs. 31A-31B illustrate F^co), |u=0.5, simplified N=4 structure, 
20 (A) before and (B) after modification according to embodiments of the present 

invention. 

It 

w j — fi 

FIG. 3 2 illustrates real and imaginary components of the ( l)e 2 value 
according to embodiments of the present invention. 

FIG. 33 illustrates signal with two samples/symbol and 40% excess 
25 bandwidth embodiments of the present invention. 

FIG. 34 illustrates a flowchart 3400 for optimizing trigonometric 
interpolation according to embodiments of the present invention. 

FIG. 35 illustrates a flowchart 3500 for optimizing trigonometric 
interpolation according to embodiments of the present invention. 
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FIG. 36 illustrates an optimized interpolator 3600 according to 
embodiments of the present invention. 

FIG. 37 illustrates an optimized interpolator 3700 according to 
embodiments of the present invention. 
5 FIG. 3 8 illustrates an angle rotator 3800 according to embodiments of the 

present invention. 

FIG. 39 illustrates an angle rotator 3900 according to embodiments of the 
present invention. 

FIG. 40 illustrates an angle rotator 3900 and example multiplier sizes 
1 0 according to embodiments of the present invention. 

FIG. 41 illustrates a flowchart 4100 for angle rotation according to 
embodiments of the present invention. 

FIG. 42 illustrates an angle rotator 3900 and multiplier sizes to achieve 
90.36 dB SFDR according to embodiments of the present invention. 
1 5 FIG. 43 illustrates an output spectrum showing 90 . 3 6 dB SFDR according 

to embodiments of the present invention. 

FIG. 44 illustrates a modified angle rotator 4400 when only one output is 
needed according to embodiments of the present invention. 

FIG. 45 illustrates a flowchart 4500 for angle rotation when only one 
20 output is needed according to embodiments of the present invention. 

FIG. 46 illustrates a phase accumulator 4600. 

FIG. 47 illustrates a Quadrature Direct Digital Frequency 
Synthesizer/Mixer (QDDFSM) 4700 according to embodiments of the present 
invention. 

25 FIG. 48 illustrates an angle rotator 4800 according to embodiments of the 

present invention. 

FIG. 49 illustrates a Booth multiplier according to embodiments of the 
present invention. 

FIG. 50 illustrates an original Booth table 5000. 
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FIG. 5 1 illustrates an negating booth table 5 1 00 according to embodiments 
of the present invention. 

FIG. 52 illustrates an negating Booth multiplier 5200. 

FIG. 53 illustrates a conditionally negating Booth decoder 5300. 
5 FIG. 54 illustrates a conditionally negating multiplier 5400. 

FIG. 55 illustrates as an angle rotator configured as quadrature direct 
digital synthesizer (QDDS) 5500 according to embodiments of the present 
invention. 

FIG. 56 illustrates an angle rotator as a cosine only direct digital 
1 0 synthesizer based on angle rotator 3900 according to embodiments of the present 

invention. 

FIG. 57 illustrates an angle rotator as a cosine only direct digital 
synthesizer based on angle rotator 4400 according to embodiments of the present 
invention. 

15 FIG. 58 illustrates a common packet format for packet based 

communications. 

FIG. 59 illustrates a system model for packet based communications. 

FIG. 60 illustrates mean values of a preamble correlator output, for 0 = 0, 
according to embodiments of the present invention. 
20 FIG. 61 illustrates a synchronizer 6100 according to embodiments of the 

present invention. 

FIG. 62 illustrates a flowchart 6200 associated with the synchronizer 6100 
according to embodiments of the present invention. 

FIG. 63 illustrates bias due to truncation. 
25 FIG. 64. illustrates a synchronizer 6400 according to embodiments of the 

present invention. 

FIG. 65 A-65B illustrate a flowchart 6200 associated with the synchronizer 
6100 according to embodiments of the present invention. 
FIG. 66 illustrates timing variance, a= 0.4. 
30 FIG. 67 illustrates timing jitter variance, a = 0. 1 . 
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FIG. 68 illustrates phase jitter variance, a = 0.1. 
FIG. 69 illustrates Cartesian-to-polar conversion. 
FIGs. 70A-70B illustrate using Newton-Raphson iteration to find l/X v 
FIG. 71 illustrates a rectangular-to-polar converter 7100 according to 
embodiments of the present invention. 

FIG. 72 illustrates angle rotation associated with the rectangular-to-polar 
converter 7100 according to embodiments of the present invention. 

FIG. 73 illustrates a flowchart 73 00 associated with the synchronizer 7 1 00 
according to embodiments of the present invention. 

FIG 74 illustrates Interpolation in a non-center interval according to 
embodiments of the present invention. 

FIGs.75A-B illustrate impulse responses of the non-center-interval 
interpolation filter (A) before and (b) after optimization, according to 
embodiments of the present invention. 

FIGs. 76A-B illustrate frequency responses of the non-center-interval 
interpolator (A) before optimization and (B) after optimization, according to 
embodiments of the present invention. 

FIG. 77 illustrates an exemplary computer system 7702, according to 
embodiments of the present invention. 

FIG. 78 illustrates a data rate expansion circuit 7800 according to 
embodiments of the present invention. 
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Detailed Description of the Preferred Embodiments 
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L Introduction 

As discussed herein, third generation and other cutting edge 
communications systems require proper synchronization between the transmitter 
and the receiver. More specifically, the frequency and phase of the receiver local 

5 oscillator should substantially match that of the transmitter local oscillator and 

accurate symbol timing must be achieved. The following section discuss some 
exemplary modulation schemes and configurations, and their related 
synchronization issues. These example configurations are not meant to be limiting, 
and are provided for example purposes only. After which, an overview of the 

10 present invention is provided. 

L 1 Exemplary Modulation Schemes and Synchronization Issues 

A key to the evolution of third-generation PCS is the ability to provide 
higher data rates via increased spectral efficiency of the access scheme. The IS- 
136 community intends to add a 200-KHz carrier bandwidth and adopt 8PSK 

1 5 modulation. This allows for data rates up to 384 Kb/s. 

A simplified 8PSK transmitter 102 and receiver 104 are shown in FIG. 1 A 
and FIG. IB, respectively. The receiver 104, as shown, performs baseband 
sampling. Alternatively, the received signal could be sampled at an IF frequency, 
where the down-conversion to baseband is performed digitally. However, since 

20 it does not alter the main subject in the present invention, the baseband-sampled 

system is used as an example. 

Referring to FIG. IB, PSK receiver 104 down-converts an IF input signal 
106 to baseband by multiplication with a local oscillator signal 108 using mixers 
110. After filtering 111, A/D converters 112 sample the down-converted signal 

25 according to a sampling clock 1 14 in preparation for logic examination. After 

further filtering 116 and equalization 118, the logic decision devices 120 examine 
the sampled signal to determine a logic output for the two channels. 
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During down-conversion, an undesirable rotation of the symbol 
constellation will occur if the frequency and phase of the oscillator signal 1 08 does 
not match the oscillator signal of the transmitter 102. This symbol rotation can 
seriously degrade system performance. When the carrier frequency offset is much 
smaller than the symbol rate, the phase and frequency mismatches can be corrected 
at baseband, using a phase rotator 124, as shown in FIG. ID. 

The sampling clock 1 14 is generated locally in the receiver 104. The logic 
decision devices 120 make more accurate decisions when the sampling instant is 
optimal, i.e., synchronous to the incoming symbols. 

If the timing information can be extracted from the signal 106, it can be 
used to adjust the phase of the sampling clock 1 14. This adjustment would require 
a voltage controlled oscillator (VCO) to drive the A/D converters 112 . In this 
scenario, the digital portion of the circuit 1 04 needs to keep in synchronization with 
the A/D converters 112, which places strict requirements on the VCO. Moreover, 
changing the phase of the sampling clock 114 would cause jitter. High data-rate 
receivers are more sensitive to such jitter when used in multimedia 
communications. 

Another solution to timing errors is to correct them entirely in the digital 
domain, with the equivalent of A/D sampling adjustment performed by a digital 
resampler 122, as shown in FIG. ID. This resampler 122 is controlled by a timing 
recovery circuit (not shown) and it attempts to supply the optimal samples (i.e. 
synchronous) to the decision circuits 120. Using the digital resampler 122, the 
timing recovery loop is closed entirely in the digital domain. This allows the 
complete separation of digital components from analog components. 

The digital resampler 122 and a direct digital frequency synthesizer (not 
shown) used by the phase rotator 124 are among the most complex components in 
a receiver (Cho, K., "A frequency-agile single-chip QAM modulator with 
beamforming diversity," Ph.D. dissertation, University of California, Los Angeles 
(1999)). Their performance is significant in the overall design of the modem. For 
multimedia communications, the high-data-rate requirement imposes a demand for 
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high computational power. However, for mobile personal communication systems, 
low cost, small size, and long battery life are desirable. Therefore, efficient 
implementation is the key to implementing fast signal processing within the 
available resources. It is also desirable to provide an efficient synchronization 
5 mechanism by using a unified approach to timing and carrier phase corrections. 

This can be accomplished by sharing resources between the resampler 122 and the 
phase rotator 124. 

As for the digital video broadcasting system (DVB) systems, the most 
challenging of all DVB transmissions is the one used in terrestrial channels (DVB- 

10 T) due to the presence of strong echoes which characterize the propagation 

medium. A common approach for DVB-T is based on Coded-OFDM (orthogonal 
frequency division multiplexing). The major benefit of OFDM is that the serial 
baseband bitstream which needs to be transmitted is distributed over many 
individual subcarriers. Such spreading makes the signal robust against the effects 

15 of multipath and narrowband interference. The simplified block diagram of an 

OFDM modem 108 is shown in FIG. 1C. 

FIG. 1C illustrates an orthogonal frequency division multiplexing system 
(OFDM) 126 having an OFDM transmitter 128 and an OFDM receiver 130. For 
the OFDM system 126, synchronization errors produce a rotation of the fast 

20 Fourier Transform (FFT) outputs of the OFDM receiver 130. (Pollet T., and 

Peeters, M., IEEE Comm. Magazine 37:80-86 (1999)). A sampling phase 
correction for the received signals can be achieved by rotating the FFT outputs at 
the receiver. For FFT rotation, it is desirable to have an efficient implementation 
structure to perform rotations of complex numbers. 

25 The example applications and modulation schemes described above in this 

section were provided for illustrative purposes only, and are not meant to be 
limiting. Other applications and combinations of such applications will be apparent 
to persons skilled in the relevant art(s) based on the teachings contained herein. 
These other applications are within the scope and spirit of the present invention. 
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1.2 Overview 

The following is an overview of the sections that follow. 
Sections 2, 3 and 4, discussed herein, present a novel interpolation method 
for digital resampling using a trigonometric polynomial. In Section 2, after a brief 
5 review of interpolation methods, particularly those using a conventional 

polynomial, the trigonometric interpolation method is introduced. Efficient 
implementation structures for trigometric interpolation are given. The performance, 
hardware complexity and computational delay are compared with conventional 
polynomial interpolators. The trigonometric- polynomial based resampler can use 
1 0 the same hardware as is employed in the phase rotator for carrier synchronization, 

thus further reducing the total complexity in the synchronization circuitry. 

In Section 3, a signal processing approach is used to analyze the 
interpolation method devised in Section 2. It shows how an arbitrary frequency 
response is achieved by applying a simple modification to the original interpolation 
15 algorithm. This enables the interpolator to also perform matched filtering of the 

received signal. 

The approaches in Section 3 can be employed to improve the interpolator 
performance by optimizing the frequency response of the continuous-time 
interpolation filter. This method is based on optimizing the performance by 

20 conceptually reconstructing the continuous-time signal from existing samples. 

From the point of view of designing digital resamplers, however, what we are 
actually interested in are new samples corresponding to the new sampling instants. 
In Section 4, we optimize the interpolation filter such that the error in producing 
a new sample corresponding to every resampling instant is minimized, hence further 

25 improving the overall interpolation accuracy. 

Section 5 presents an angle rotation processor that can be used to efficiently 
implement the trigonometric resampler and the carrier phase rotator. This structure 
can also implement the resampler for an OFDM receiver, which rotates the FFT 
outputs. It has many other practical applications. 
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The discussions in the previous Sections have assumed that the sampling 
mismatch that is supplied to the resampler is known. The problem of obtaining the 
synchronization parameters is studied in Section 6. For burst mode transmissions 
in packet data systems, we present an efficient architecture for feedforward symbol- 
5 timing and carrier-phase estimation. 

Section 7 presents an efficient implementation of a key component in the 
feedforward synchronizer of Section 6, as well as in many other such synchronizers. 
This involves computing the angle from the real and imaginary components of a 
complex number. The discussion, however, extends to a general problem of 
10 Cartesian-to-polar conversion, which is encountered in many communication 

applications. An architecture that efficiently accomplishes this conversion is 
presented. 

Section 8 presents an exemplary computer system in which the invention 
can be operated. 
15 Section 9 includes various appendices. 

Further discussions related to materials in Sections 2-7 are included in 
Dengwei Fu, "Efficient Synchronization for Multimedia Communications," Ph.D 
dissertation, University of California, Los Angeles, 2000, which is incorporated-by- 
reference, in its entirety. 
20 Additionally, the following articles are herein incorporated by reference: 

D. Fu and A. N. Willson, Jr., "Ahigh-speed processor for digital sine/cosine 
generation and angle rotation," in Conf Record 32nd Annual Asilomar Conference 
on Signals, Systems and Computers, vol. 1, pp. 177-181, Nov. 1998; 

D. Fu and A. N. Willson, Jr., "Interpolation in timing recovery using a 
25 trigonometric polynomial and its implementation," in Proc. GLOBECOM 1998, 

Comm. Theory Mini-Conference Record, pp. 173-178, Nov. 1998; 

D. Fu and A. N. Willson, Jr., "Design of an improved interpolation filter 
using a trigonometric polynomial," in Proc. Int. Symp. Circuits & Systems, vol. 4, 
pp. 363-366, May 30- June 3, 1999; 
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D. Fu and A. N. Willson, Jr., "A high-speed processor for rectangular-to- 
polar conversion with applications in digital communications " in Proc. 
GLOBECOM 1999, vol. 4, pp. 2172-2176, Dec. 1999; 

D. Fu and A N. Willson, Jr., "A fast synchronizer for burst modems with 
5 simultaneous symbol timing and carrier phase estimations," in Proc. Int. Symp. 

Circuits & Systems, vol. 3, pp. 379-382, May 28-31, 2000; and 

D. Fu and A. N. Willson, Jr., "Optimal interpolator using a trigonometric 
polynomial," inProc. 4 3rd Midwest Symp. Circuits& Systems,4 pages, Aug. 8-11, 
2000. 

10 

2. Interpolation Using a Trigonometric Polynomial 

As discussed in Section 1, when an analog-to-digital converter (ADC) is 
clocked at a fixed rate, the resampler must provide the receiver with correct 
samples, as if the sampling is synchronized to the incoming symbols. How can the 

1 5 resampler recover the synchronized samples by digital means without altering the 

sampling clock? Since the input analog signal to the ADC is bandlimited, as long 
as the sampling rate is at least twice the signal bandwidth, according to the 
sampling theorem, the sampled signal carries as much information as the 
continuous-time signal. Therefore, the value of the original continuous-time signal 

20 at an arbitrary point can be evaluated by applying an interpolation filter (e.g., sine 

interpolation) to the samples. Hence, the design of the resampler has been 
transformed to the design of effective interpolation filters or, in other words, 
fractional-delay filters with variable delay. 

There are numerous methods for designing fractional-delay filters. These 

25 filters have different coefficients for different delays. Thus, to implement variable- 

delay interpolation filters, one could compute one set of coefficients for each 
quantized delay value and store them in a memory. Then, in real-time, depending 
on the fractional delay extracted from the incoming signal, the corresponding 
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coefficients could be loaded. However, this method is likely to result in a large 
coefficient memory. 

To design low-cost modems, a large coefficient memory is undesirable. 
Gardner, et al., have shown that polynomials can be incorporated to compute the 
desired samples that are synchronous with the transmitted samples (Gardner, F.M, 
IEEE Tram. Comm. 47:502-508 (1993); Erup, L., et al f IEEE Trans. Comm. 
41 : 998- 1 008 ( 1 993 )). In this case, an extensive coefficient memory is not needed. 
Moreover, the polynomial-based structure can be implemented efficiently with a so- 
called Farrow structure (Farrow, C , "A continuously variable digital delay 
element," in Proa IEEE Int. Symp. Circuits Syst. (June 1988), pp. 2641-2645). 
This method is reviewed in Section 2.1. Although this approach achieves 
reasonable performance, the hardware complexity grows rapidly as the number of 
samples used to calculate each new sample is increased for better accuracy. In 
addition, given a fractional delay n, to produce a new sample using a degree N-l 
polynomial, there will be AM sequential multiplications that involve \i since we 
must compute \i raised to the (AM)-th power times a data value. Thus, the critical 
data path gets longer as ^increases, thereby creating a limitation on the achievable 
data rate. 

Starting in Section 2.2, a new approach to interpolation is introduced. 
Instead of approximating the continuous-time signal with a conventional (i.e., 
algebraic) polynomial, a trigonometric polynomial is used according to the present 
invention. First, some background information is given. Next, the detailed 
implementation is discussed. We then evaluate and compare the performance and 
computational complexity of the algebraic polynomial interpolation to that of our 
method, giving numerical results. 

2 1 Interpolation Using an Algebraic Polynomial 

To interpolate using function valuesX") at ^equally-spaced sample points, 
also referred to as a "base point set," we can fit an algebraic polynomial of degree 
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N-l to the data, as in FIG. 2. As explained in (Gardner, F.M., IEEE Trans. Comm. 
41 :502-508 (1993)), there should be an even number of samples in the base point 
set, and the interpolation should be performed only in the center interval of the base 
point set. That is, Nis restricted to be even. In other words, given 4 samples points 
5 in FIG. 2 including y(-l), y(0), y(l), and y(2), the interpolation is performed at 

offset \i between the points y(0) and y(l) to determine the point 202 on the curve 
P(t). 

It seems that one would have to solve for the coefficients of the (TV-l)-th 
degree polynomial from these available samples before the synchronized (i.e., 

10 interpolated) samples can be computed. However, a method devised by Farrow 

(Farrow, C, "A continuously variable digital delay element," in Proa IEEE Int. 
Symp. Circuits Syst (June 1988), pp. 2641-2645) can compute the synchronized 
sample from the available samples efficiently with use of an algorithm that is well 
suited for VLSI implementation. To illustrate, we consider the following example 

1 5 of interpolation using a cubic Lagrange polynomial. Without loss of generality, let 

us assume the sampling interval is T s = 1 . Using the Lagrange formula for N= 4, 
the synchronized samples can be computed as 

A-l)C_XM)+y(0)cAv)+Al)cXju) + y(2)C 2 (M) (2.1) 
where C„(n), n = -1, 0, 1, 2, are the third degree polynomials that are shown in 
20 FIG. 3. 



Obviously, 




1 ju = nT,nan integer 
0 all other integers. 



Thus, y(n) in (2. 1 ), the sum of polynomials C„(|i) weighted by the y(n) values, must 
be a third degree polynomial and must go through the samples y(-l), y(0), y(l) and 
25 y(2). Writing C n (ii) as 

C n (M)=T,c nt M k (2.3) 

k = 0 
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the coefficients of C n (ii) are fixed numbers. They are independent of \i. We 
can re- write (2. 1) as 

y{ju)= tyin)±c^ = ±{±y(n)c}s = ±v(k)ju* (2.4) 

n=-\ k=0 Jfc=<A«=-l J k=0 

where 

v(k)= t,yMc*- ( 25 > 



To minimize the number of multipliers, we can use a nested evaluation of (2.4), as 

y(ju) = ((v(3)// + v(2))// + v{l))ju + v(0). (2.6) 
A Farrow structure 400 (for N = 4) that implements equations (2.5) and 
(2.6) is shown in FIG. 4. It consists of multiplications of data by fixed coefficients, 
10 and data multipliers, as well as addition operations. 

2. 2 The Trigonometric Polynomial Method 

To solve the problems discussed in the section 2.1, the present invention 
utilizes a trigonometric polynomial to fit the asynchronous samples in FIG. 2. 
Using W N = e' j2n/N notation, for t g [-NI2 + 1, N/2], the polynomial may be written 
15 as: 



1 



( */V>_1 1 1 ^ 



Nf2-\ 1 1 

V (A = _!_ y c W -*t j r L c W -(W2)t + ± jyOrn* 



(2.7) 



~N/2" N f} ~~NI2" N 

Vjt=-Ar/2+i z z y 

The polynomial must cross the TV samples. Thus, the coefficients c k can be 
determined by solving the N linear equations in TV unknowns: 

, , l m ^ N N 

y(") = T7 X w=- — +1,...,— (2.8) 

YV *=-AT/2+l Z Z 



20 whose solution is 
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N_ 
2 



c k = X J^W", A:=-— +1,...,— . (2.9) 

n=-W/2+l 

The expression in (2.9) is simply the TV-point discrete Fourier transform 
(DFT). This suggests that, given N equally-spaced samples, we can compute the 
DFT of these samples as in (2.9) to obtain the coefficients of the interpolating 
trigonometric polynomial in (2.7). Then, for a given offset \i, the synchronized 
sample Xl^) can be computed using that polynomial as: 



1 



NI2-X 



^ V Vjfc=-2V72+1 ' 

Since c k and c_ k are conjugates, this equation can be simplified as 



(2.10) 



j / N/2-l ^ 

y(M) = —Re[c 0 + 2^ c k W~ k ^ c Ni2 e^ 



k=\ 



(2.11) 



Flowchart 500 in FIG. 5 summarizes the interpolation between two sample 
points at an offset \i using a trigonometric polynomial, where the two data samples 
that are to be interpolated are part of a set of N-data samples (see FIG. 2). The 
flowchart is described as follows. 

In step 502, a set of N-data samples are received having the two data 
samples that are to be interpolated. 

In step 5 04, coefficients of a trigonometric polynomial are determined based 
on the set of N data samples, according to equation (2.9). In doing so, the N data 
samples are multiplied by a complex scaling factor W^to generate a k* coefficient 
for the trigonometric polynomial, wherein W N = e* j2Tt/N , and wherein n represents an 
n* data sample of said N data samples. 

In step 506, the trigonometric polynomial is evaluated at the offset \i based 
on equation (2. 10) or (2. 1 1). 

In step 508, the real part of the evaluated trigonometric polynomial is 
determined and represents the desired interpolation value 202 in FIG. 2. 
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There are three issues that are to be considered in evaluating an 
interpolation scheme: 1) accuracy of interpolation, 2) complexity of implementation 
and 3) latency. In following sections, the trigonometric interpolation method is 
compared to the traditional polynomial method, particularly the Lagrange 
5 interpolator, in all these categories. 



2.3 Performance Comparisons 



10 



15 



Let us first derive the impulse responses of the interpolators. With N 
samples, N an even integer, the Lagrange formula (2. 1) is 

N/2 

y[M)= E y{n)cXf*\ ( 212 ) 

n=-NI2+\ 

In addition, the interpolation is performed in the center interval. Thus0<u<l. Let 
us define a new function./^) such that 



f(M-n)=C n (ju), 0<//<l, - N/2+l<n< N/2. 
Using the example of FIG. 3, defining /=(i-n,we have 



(2.13) 



r C 2 (t + 2) 

cfy + 1) 
f(t)= c 0 (t) 

{ o 

Thus, the Lagrange formula becomes 



-2< t< -1 
- 1< t < 0 
0< t < 1 
1< t < 2 
otherwise. 



(2.14) 



N/2 



20 



At)= Z y(n)C n {t)= X y(n)f(t-n). (2.15) 

Therefore, the approach to reconstruct the continuous signal using the Lagrange 
polynomial is in fact equivalent to applying an interpolation filter fit) to the 
available samples, with^O being a piecewise polynomial The interpolator's 
impulse response./^) obtained from (2.14) is shown in FIG. 6 A. 
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Taking the Fourier transform of/(0> we obtain its frequency response. This 
allows us to evaluate the interpolation accuracy by examining the frequency 
response of the interpolation filter. The frequency response 702 of the Lagrange 
cubic interpolator (N = 4) is shown in FIG. 7A. The horizontal axis is the 
normalized frequency // F s , with F s = 1 / T s . An ideal frequency response should 
have value one in the passband (0 ^ // F s < 0. 5) and be zero in the stopband (flF s 
* 0.5). 

For the interpolator using a trigonometric polynomial, we can express y(f) 
in terms of y(n) by substituting (2.9) into (2.10): 

j N/2-1 | 

^) = 77 I c k W- k * + —c Nn co$7Z{i 

i N/2-1 f N/2 > 

4mi mm, 

iV k=-N/2+l^n=-N/2+l > 



YV N 



i ( N/2 \ 



N \ n =-N/2+l 



COSTT// (2.16) 

N/2 { N/2-l \ 



1 NI2 f N/2-l 

T7 I M I W^K(-ITC0S^ 

M n=-N/2+\ ^k=- N/2+l J 

T7 Z A n ) 1+2 Z COS— —(//-/?) +COS^{//-w) 



Defining 



fit)* 



1+2 Yj cos— — t+cosrt \t\<N/2 

k-.i N (2.17) 

0 \t\>N/2 



we have 



1 1X3 1 

Am) = -jf S A*) Am- n ) = ~^y® /„• (2.is) 
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The impulse response f(t) in (2.17) is shown in FIG. 6B. The corresponding 
frequency response 704 of the trigonometric interpolator (for N=4) is shown in 
FIG. 7A in thin lines. 

By comparing the frequency responses of the two interpolators, we can see 

5 that the trigonometric interpolation response 704 has a sharper roll-off in the 

transition band and more rapid attenuation in the stopband than the Lagrange 
response 702. These traits are enhanced as N increases, as demonstrated in 
FIG. 7B. For N=32, the trigonometric response 708 has a sharper rolloffthan the 
Lagrange response 706 as shown in FIG. 7B. 

0 Next we verify these observations by interpolating the samples of a practical 

time-domain signal. As an example, we interpolate a baseband signal with raised 
cosine spectrum and roll-off factor a = 1.0, sampled at two samples per symbol 
period, as shown in FIG. 8 A. 

The interpolation accuracy here is measured as the normalized mean- 

5 squared difference between the signal interpolated with an ideal interpolator and 

the signal interpolated with the practical interpolator. The normalized mean- 
squared error (NMSE), discussed above, is calculated for both the Lagrange 
interpolator and the trigonometric interpolator for a range of typical values of N. 
The results are plotted in FIG. 8B. 

0 Our test results demonstrate that the performance is improved with the 

trigonometric method. Using the same number of samples to interpolate, the 
proposed method produces a smaller NMSE, and the performance gain becomes 
greater as the number of samples increases. 

2. 4 Efficient Implementation Structures 

5 Recalling from Section 2.2, the trigonometric interpolation algorithm 

includes substantially two steps: 
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Step 1 . Given a number of data samples N y calculate the Fourier coefficients 
c h k = 0, N/2 using (2.9). In a preferred embodiment, an even number of N data 
samples are used. In other embodiments, an odd number of data samples are used. 

Step 2, Compute the synchronized sample y{\x) for any given \i according 
to (2.11). 

The first step involves multiplying the data samples by complex scaling 
factors W^, Since these factors lie on the unit circle, the computation in Step 1 
can be simplified. Let us examine the case when N= 4: 

Example 2,1: For N= 4, the Fourier coefficients are obtained as: 

c 0 = y(-l) + y(0) + y{l) + y(2) 
<h = \}<0) - y(2)]+ j[-y(l) + y{- 1)] (2.19) 
c 2 = y(0)-y(i) + y(2)-y(-l). 

As seen in (2. 19), there is no nontrivial scaling multiplier required for N = 4. 

Example 2.2: We now compute the coefficients c k in (2.9) for N= 8. Using 
the trigonometric identities, we can obtain the following simple form for c h k = 0, 
4: 

Co = y(- 3) + y(-2) + y{- 1) + M + y(\) + y{2) + y(l) + y{4) 
q = j J<0) - J<4) + [ - y(-3) + y(\) + y{- 1) - y(3) | cos(;r/ 4)[ 

+ j[y(- 2) - y(2) + [ >(- 3) - y(\) + y(- 1) - j<3) | cos(;r / 4)) 
c 2 = {-^(-2) + 3^0) - >.(2) + 3<4)} + y{-j/(-3) + y(- 1) - + j^(3)} (2.20) 
M - y{4) + [ j<-3) - y(l) + {-y(- A + y(3)) ]cos(;r/ 4)| 



+ ;->(-2) + >(2) + 



y(-3)-y(l) + (-^(-l)-^(3)) ]cos(^/4)| 



c 4 = - J<- 3) + y(-2) - y(- 1) + M - y(\) + M - j<3) + j<4) 
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The only non-trivial scaling multiplications are those multiplications by cos 
(tt/4). It appears that four such multiplications are needed to compute all the 
complex coefficients c h k = 0, 4. However, if we examine the data being 
multiplied by cos {nl A) (those terms embraced by the [] brackets), we observe that 
5 they are either the sums or differences of the [y(-3)-Xl)] and [y(-l)-.y(3)] values. 

Therefore, we can compute \y(-3)-y(l)] cos(7r/4) and \y(-l)-y0)]cos(n/4\ then 
use these two products to generate the c„ coefficients. Thus, only two scaling 
multiplications are needed in computing all the coefficients. 

Having observed the simplicity in the first step, let us focus on the second 

10 step. The second step may look complicated because of the complex 

multiplications c k W N ~ kfX and c N/2 e! nfi . However, since \W^\ = \^\ = 1, these 
products are just rotations of points c k and c m in the complex plane. Furthermore, 
this is the same type of operation performed in the phase rotation for carrier 
recovery by the phase rotator 124 that is shown in FIG. ID. This suggests that we 

1 5 can reduce the total complexity of the synchronization circuitry by sharing some 

resources needed by the digital resampler 122 and the carrier phase rotator 124. 
In one embodiment, a lookup table is utilized to determine the angle rotation 
associated with the angle ^~kju for rotation of the coefficients. In another 

embodiment, an angle rotator processor is utilized. Both embodiments are 
20 discussed further below, and the angle rotator processor is discussed in detail in 

section 5. 

FIG. 10 illustrates an trigonometric interpolator 1000 that is one circuit 
configuration that implements the trigonometric interpolator equations (2.9)- 
(2.11), where the number of data samples is N=4. The interpolator 1000 is not 
25 meant to be limiting, as those skilled in the arts may recognize other circuit 

configurations that implement the equations (2.8)-(2.11). These other circuit 
configurations are within the scope and spirit of the present invention. 

The trigonometric interpolator 1000 receives input data samples having 
two data samples that are to be interpolated at an offset p. (see FIG. 2). The 



1904.0140003 



-30- 



resulting interpolated value y((i) represents the interpolated point 202 in FIG. 2. 
The interpolator 1000 includes a delay module 1004, an adder/subtractor module 
1006, and an angle rotator module 1008, and an adder 1012. 

The delay module 1004 includes one or more delay elements 1012. The 
delay elements 1012 can be configured using known components. 

The adder/subtractor module 1006 includes multiple adders (or 
subtractors) 1014, where subtraction is indicated by a (-) signs. 

The angle rotator module includes two angle rotators 1010. The angle 
rotators 1 0 1 0 can be configured using an angle rotator processor or a table lookup 
(e.g. read only memory ) as discussed below. 

The operation of the trigonometric interpolator 1000 is discussed further 
in reference to the flowchart 1700 in FIG. 17, which is discussed below. 

In step 1702, the interpolator 1000 receives a set of N-input data samples. 
The N-data samples include the two data samples that are to be interpolated at the 
offset \x relative to one of the data samples, as shown in the FIG. 2. In FIG. 2, the 
interpolation is to be performed between y(0) and y(l) at the offset |i to determine 
the interpolation value 202. 

In step 1704, the delay module 1004 delays the input data samples. 

In step 1706, the adder/subtractor module 1006 generates one or more 
trigonometric coefficients according to the equation (2.9). In FIG. 10, the 
coefficients are represented by C 0 , C u and C 2 for N=4, where the coefficients C x 
is a complex coefficients. 

In step 1708, the angle rotators 1010 rotate appropriate real and complex 
coefficients in a complex plane according the offset \i 9 resulting in rotated complex 
coefficients. More specifically, the angle rotator 1010a rotates the real coefficient 
C 2 , and the angle rotator 1010b rotates the complex coefficient C x in the complex 
plane. 

In embodiments, as discussed herein, the angle rotators 1010 are table 
look-ups. In which case, a rotation factor is retrieved from the table lookup based 
on the offset |n, where the rotation factor includes the evaluated cosine and sine 
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functions of ^ kju that are shown in the equations (2.21) below. The rotation 

factor is then multiplied by the corresponding real or complex coefficient, to 
generate the respective rotated complex coefficient. An interpolator 1800 having 
a table lookup ROM 1802 and a complex multiplier 1804 are shown in FIG. 18 

5 for illustration. 

In step 1710, the adder 1012 adds together C 0 , a real part of the rotated 
coefficient C l5 and a real part of the rotated coefficient C 2 . The adder 1012 also 
scales the output of the adder as necessary according to equation 2.10. The 
resulting value is the desired interpolation value at the offset \i, as represented by 

10 point 202 in FIG.2. 

The trigonometric interpolator is not limited to the 4th degree embodiment 
that is shown in FIG. 10. The trigonometric interpolator can be configured as an 
N* 1 degree interpolator based on N-data points, as represented by the equations 
(2.9)-(2. 1 1). These other N* 11 degree interpolators are within the scope and spirit 

15 of the present invention. For example and without limitation, FIG. 1 1 illustrates 

an interpolotor 1 100 having N=8. The trigonometric interpolator 1 100 includes: 
a delay module 1102, an adder/subtractor module 1104 (having two scaling 
multipliers having coefficients cos (ti/4)), an angle rotator module 1106, and an 
adder 1108(having an 1/8 scale factor that is not shown). The operation of the 

20 interpolator 1100 will be understood by those skilled in the arts based on the 

discussion herein. 

2. 4. 1 Using a Lookup Table 

For carrier recovery, the phase correction is generally accomplished by 
25 looking up the sine and cosine values corresponding to the phase, then by 

multiplying these values with the complex data. This requires the same operations 

as the rotation of c k by an angle ^ k/u, , that is: 
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Re(c k W- kM )= Re(c k )cos—kju- Im(c k )sm—kju 
Im(c k W N kM )= Re(c k )sm— ty+ In(c k ) cos— 

The sine and cosine table can be used for both the resampler, as in (2.2 1), and the 
phase rotator for carrier recovery. In embodiments, a read only memory (ROM) 
is utilized as the lookup table. However, other embodiments could be utilized in 
5 including other types of memories. An interpolator 1800 utilizing a table lookup 

ROM 1 802 and complex multiplier 1 804 are shown in FIG, 1 8 for illustration. The 
ROM table access time is insignificant as compared to the computation time in 
other parts of the interpolator. Therefore, this method results in low hardware 
complexity and low computational delay. This implementation will be referred to 
10 as the table-lookup method. 



2. 4. 2 Using an Angle Rotation Processor 



When a very low complexity implementation is desired at the expense of 
a slight increase in computational delay, we propose to use an efficient structure 
for angle rotation, which is described in Section 5. Based on this structure, each 
15 angle rotator has a hardware complexity slightly greater than that of two 

multipliers. In addition, a very small ROM is needed. We will subsequently refer 
to this particular implementation of our algorithm as the angle-rotation method. 

Thus, there are at least two choices for implementing (2.21) as well as the 
phase rotator for carrier recovery. The trade-off is between complexity and speed. 
20 In a base-station where computation power is more affordable, the table lookup 

method might be used. In a hand-set, where cost is a major concern, an angle 
rotation processor might be used for both the resampler and the phase rotator, 
multiplexing the operations. 

Now let us compare the complexity of the trigonometric resampler with 
25 that of the Lagrange method. Table 2. 1 summarizes the comparisons for several 
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typical lvalues. The numbers are based on the table-lookup method. It indicates 
that, for the same filter length Nthe trigonometric interpolation method needs less 
hardware. 



Table 2.1 Complexity and latency comparisons. 
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2.5 Delays in the Computation 

The critical path of the Farrow structure 400 (FIG. 2) are now compared 
to that of the trigonometric interpolator. The Farrow structure implements the 
1 5 Lagrange interpolator as discussed above. The Farrow structure 400 is shown in 

FIG. 9(or FIG. 4), with the critical path 902 indicated. The critical path 902 for 
this structure includes one scaling multiplier 904 and N - 1 data multipliers 906. 

In contrast, the critical path for the trigonometric interpolator 1 000 is path 
1002 and it contains just one angle rotation 101 0, or only one multiplier if the 
20 table-lookup method is employed to replace the angle rotator 1010. Since the 

angle rotations for various angles can be carried out in parallel, the critical path 
does not lengthen as iV increases. 

Table 2.1 compares the computational delays for the trigonometric 
interpolator with that of the Lagrange interpolator for various values of N. The 
25 delay data for the trigonometric interpolator 1000 are based on the table-lookup 

method. As shown in FIG. 10, the trigonometric interpolator (for N=4) has only 
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one multiplier in the critical path, whereas the Lagrange interpolator has 4 
multipliers in the critical path. Therefore, the trigonometric interpolator has less 
latency than the Lagrange interpolator, which is important for voice 
communications. 

5 2.6 Simplifications of the Preliminary Structures 

As mentioned in Section 2.4, to produce each><n) we first calculate the 
Fourier coefficients c k using existing samples, according to (2.9). We then 
compute RQ(c k W^ kM ) to be used in ( 211 )- This is accomplished either by 
retrieving W^ kM from a lookup table, followed by two real multiplications, or by 
10 an angle-rotation processor. 

2. 6. 1 The Simplified Structure for N=4 

Let us examine the trigonometric interpolator 1000 having N=4. To 
compute Re(2cffl M ) and Re{c^V~ 2fi ) the system requires either two angle 
rotators 1004 or two accesses to a lookup table. 

15 If the input samples would happen to be such that c/=0 then one fewer 

angle rotator, or one fewer ROM access, would be required. Of course, the 
original data samples y{-\\ X°)> .K 1 )* and>< 2 ) are not likely to have the special 
property that c 2 =y(0)-y(l)+y(2)-y(-\)=0. However, if the data samples are 
changed, then the modified samples can be determined to satisfy c 2 =0. If the 

20 modified samples for interpolation, then the c 2 angle rotator 1010a can be 

eliminated. However, the interpolation result will not then correspond to the 
original data samples. 

It seems that the data samples can be changed to attain savings in 
hardware, as long as the interpolation result is fixed so that it corresponds to the 

25 original data samples. Of course, it must also cost less in hardware to "fix" the 

interpolation result than is saved in hardware by using the modified samples. 
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If the samples y(k) are modified to form y(k) according to: 

y (0) = y(.o) 

y(l)=y(l)+K V- 22 > 
y(2)=y(2)+2K 

then the lvalue can be adjusted to force the y{ k) samples to satisfy c 2 =0, where 

K is the slope of a straight line 1202 in FIG. 12. 

To find K, the c 2 value that corresponds to the modified samples is 
determined according to: 

c 2 = y(o)-m)+y(2)-n-v 

= y(0) - (7(1) + K) + (y(2) + 2K) - (y(- 1) - K) (2.23) 
= 2K-(y(l)+y(-l)-y(0)-y(2)). 

To force c/=0, requires: 

k = \{y(i) + y{- 1) - X0) - y(2)) (2.24) 

Therefore, the c 2 angle-rotator can be eliminated, and c 0 and c x are determined 
accordingly as, 

c 0 =2(y(l)+y(-l)) 

c 1 =[2X0)-Xi)-X-i)]+y[-2>'(i)+y(0)+X2)]. (2.25) 

Then, the interpolated sample is 



1 1 f K ^ 



c x e 2 



(2.26) 



However, >>(//) should be adjusted so that it corresponds to the original 
samples. From FIG. 12, the values expressing the difference between the original 
and the modified samples lie on the straight line 1202. From FIG. 13, it follows 
that the offset due to the modification of samples is K\i. Therefore, the 
y(ju) value can be compensated by: 
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y(j*)=y(ti)-Kfi. (2.27) 

Using equations (2.25), (2.26) and (2.27) leads to an interpolator 1400 
as shown in FIG. 14. This simplified interpolator structure is not limited to N=4 
configurations. In fact, this simplification technique can be applied to an 
5 interpolator with an arbitrary TVvalue. To eliminate the angle-rotation needed by 

Tte(c N/2 e J * M ) in (2. 1 1), the samples are modified according to 

y(n) = y(n) + nK, n = integer. (2.28) 

Using (2.9), and then applying (2.28), results in 

1 ( Nn \ 1 f N/2 ^ 1 

C */*=V I Z U^. (2.29) 



10 If we choose 



2 N/2 

^="T7 Z (-!)>(«) (230) 

iV n=-N /2+1 



we can force c N/2 =0. 



Referring to FIG. 14, the interpolator 1400 includes the delay module 
1004, an adder/subtractor module 1402, the angle rotator 1010b, a multiplier 

15 1404, and an adder 1406. The interpolator 1400 has a simplified structure when 

compared the interpolator 1000 (in FIG. 10), as the interpolator 1400 replaces the 
angle rotator 1010b with a multiplier 1404. As discussed above, this can be done 
because the coefficient 0^=0 (C 2 =0 for N=4) by modification of the data samples, 
and therefore there is no need to have an angle rotator for C W2 . The operation of 

20 the interpolator 1400 is further discussed in reference to the flowchart 1900 that 

follows. 

In step 1902, the interpolator 1400 receives a set of N input data samples. 
The N data samples include two of the data samples that are to be interpolated at 
the offset as shown in the FIG. 2. 
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In step 1904, the delay module 1004 delays the input data samples. 

In step 1906, the adder/subtractor module 1402 modifies one or more of 
the data samples so that a coefficient is 0. In embodiments the data samples 
are modified according to y(n) mod = y(n) + n-K, wherein K is determined by the 
5 equation (2.30) above so that c^ is 0, and wherein y(n) represents the n* 1 data 

sample of the N data sample set. 

In step 1908, the adder/subtractor module 1402 generates one or more 
trigonometric coefficients according modifications to the equation (2.8). In the 
N=4 case, equations 2.25 are implemented by the module 1402. In FIG. 14, for 
10 N=4, the coefficients are represented by C 0 and Q, where the coefficient Q is a 

complex coefficients. By comparing with FIG. 10, it is noted that the C 2 
coefficient is 0. Additionally, the adder/subtractor module 1402 outputs the K 
value for further processing. Notice also that in FIG. 14, the output scaling factor 
has been changed from V4 to 1/2. This reflects several other straightforward 
1 5 simplfications that have been made to module 1 402 and in the angle rotator 1 0 1 0b . 

In embodiments, the steps 1906 and 1908 are be preformed simultaneously by the 
adder/subtractor module 1402, as will be understood by those skilled in the 
relevant arts. 

In step 1910, the angle rotator 1010b rotates the complex coefficient Q 
20 in a complex plane according the offset n, resulting in a rotated complex 

coefficient. In embodiments, as discussed herein, the angle rotators 1 0 1 0b is table 
look-up. In which case, a complex rotation factor is retrieve from the table lookup 
based on the offset \i 3 and the resulting rotation factor is then multiplied by the 
corresponding complex coefficient, to generate the respective rotated complex 
25 coefficient. The rotation factor includes the evaluation of the cosine and sine 

factors that are shown in equations 2.21 . Note that since C 2 = 0, the angle rotator 
1010a is replaced with the multiplier 1404. 

In the step 1912, the multiplier 1404 multiplies the K factor by the offset 
\i 9 to produce a K|i factor. 
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In step 1914, the adder 1406 adds together C„ and K\i and a real part of 
the rotated complex coefficient C x , and scales the sum by the trivial factor Vi, to 
produce the desired interpolation value. The addition of the Ku factor 
compensates the desired interpolation value for the modification that was done to 
the data samples in order to force c^, to zero in step 1906. 

The simplified trigonometric interpolator is not limited to the four sample 
embodiment that is shown in FIG. 14. The simplified trigonometric interpolator 
can be configured as an N sample interpolator based on N-data points, as 
represented by the equations (2.28)-(2.30). These other N-sample interpolators 
are within the scope and spirit of the present invention. For example and without 
limitation, an interpolator with N=S is discussed below. 

2.6.2 The Simplified Structure for N=8 

According to (2.30), we choose 

K= ^(y(l)+ y(3)+ y(-3) + y(-Y)- y(0)- y(2)- y(4)- y(-2)). ( 2 .31) 

The coefficient values can be computed following this procedure: 
p\ = (4K+y{\) -X-3)) cos tt/4 
p2 = (4K +y(3) -y(-l)) cos %I4 

c 0 = 2(X1) + J(3) + X-3) + X- 1)) 

c, = [ v(0) - v(4) - 4K + pi- p2] + j[y(-2) - y(2) -4K- pi- p2] 

c\ = [4k\ y(0) + y(4) - y(2) - y(-2)]+ j[4K + y(3) + y(- 1) - y(l) - y(-3)] 

c, = [y(0)~ y(4)-4K- pl+ p2]+ j[-y(-2)+ y(2) + 4K- pi- p2] 

c 4 = 0. 

(2.33) 

The resulting modified structure for N=8 is shown in FIG. 15 as 
interpolator 1 500. Similar to the interpolator 1400, the interpolator 1 500 includes 
a delay module 1504, an adder/subtractor module 1506, an angle rotator module 
1508, a multiplier 1510, and an output scaling adder 1512. As in the interpolator 
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1 400, the multiplier 1510 substantially replaces an angle rotator module. As in the 
interpolator 1 100 for #=8 (FIG. 11), only two non-trivial scaling multiplications 
are needed for the modified structure 1500. 

2. 6.3 Performance Comparisons with Other Structures 

How does the simplified interpolator 1400 (FIG. 14) perform as compared 
to the interpolator 1000 (FIG. 10)? FIGs. 16A-C show the frequency responses, 
in solid lines, of the Lagrange cubic interpolator 400 (FIG.#4), the interpolator 
1000 (FIG. 10) and the simplified interpolator 1400 (FIG. 14), respectively. For 
an input signal whose spectrum is a raised cosine with a=0.4, as shown in dashed 
lines, the amount of interpolation error corresponds to the gray areas. Clearly, the 
interpolator 1400 produces less error than the Lagrange cubic interpolator 400 
and the interpolator 1000 . (FIG. 16D will be discussed in Section 4.) 

Next, let us verify this performance improvement by interpolating two 
signals: Signal 1, which is the same as signal 802 in FIG. 8A (a=l .0), and Signal 
2, which is signal 1602 in FIG. 16 (a=0.4). Then, theNMSE values are compared. 
We use three interpolators: 1) Lagrange cubic interpolator 400; 2) the 
trigonometric interpolator 1000; and 3) and the trigonometric interpolator 1400. 
Also compared is the number of multipliers. 

The results in Table 2.2 show that the modified structure for N = 4 not 
only requires less hardware, it also obtains the highest accuracy among the three 
methods for these practical signals used in our simulation. 
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Table 2.2 Comparison of interpolators for N=4. 



iv — 


Lagrange 
cubic 


Structure in 
FIG. 10 


Structure in 
Fig. 14 


NMSE for Signal 1 in dB 


-25.80 


-28.45 
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NMSE for Signal 2 in dB 
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Nontrivial scaling 
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Data multipliers 
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Multipliers in critical path 
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* The trigonometric interpolator 1 000 emp] 


oys two one-output angle-rotators, each 



having the hardware equivalent of slightly more than two multipliers. The 
trigonometric interpolator 1400 employs one such angle-rotator and one 
multiplier yielding an equivalent of slightly more than three multipliers. 

2. 7 Trigonometric Interpolator Application 

An important application of the interpolation method and apparatus 
described in this patent is the following. It is often necessary to increase the 
sampling rate of a given signal by a fixed integer factor. For example, a signal 
received at a rate of 1000 samples per second might need to be converted to one 
at a rate of 4000 samples per second, which represents an increase of the sampling 
rate by the integer factor four. There are methods in common practice for doing 
such a conversion. One method is a very popular two-step process wherein the 
first step creates a signal at the desired higher sampling rate but one where simple 
zero-valued samples (three of them in the example situation just mentioned) are 
inserted after each input data value. The second step in the interpolation scheme 
is to pass this "up-sampled" or "data-rate expanded" signal through an 
appropriately designed lowpass digital filter which, in effect, smoothes the signal 
by "filling in" data values at the previously inserted zero-valued samples. In the 
process of doing this filtering operation it might or might not be important that the 
original data samples remain intact, and when this is important there exist certain 
special lowpass filters that will not alter those samples. 
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We can easily adapt the trigonometric interpolator described herein to 
efficiently create such a sampling rate conversion system, but one that does not 
require such filtering operations. If we denote the integer factor by which we 
desire to increase the data rate as L (in the above example, L = 4) we proceed as 
5 follows. We build the system 7800 shown below in Fig. 78A. System 7800 

includes a Delay Module 7802 and Add/Subtract Module 7804 (that are similar 
to that in FIG. 10), and such that it can accommodate incoming data at a rate r. 
We now build L copies of the Angle-Rotation Module 7806 (similar to that in FIG. 
10), with each one being fed by the same outputs of the Add/Subtract Module. 

1 0 Within each of these L Angle-Rotation Modules 7806 we fix the |i value; that is, 

each one has one has a different one of the values: 1/L, 2/L, . . . , (1 -L)/L With such 
fixed (i values, each Angle-Rotation Module 7806 can be constructed as a set of 
fixed multipliers (a very special case of the table-lookup method), although any of 
the Angle-Rotation Module implementations previously discussed can be 

15 employed. 

As shown in Fig. 78, the L-1 outputs, i.e. , the interpolated samples that are 
offset by the values 1/L, 2/L,..., (L-1)/L from the first of the two data points 
(indicated as \x = 0 and \x = 1 in the Delay Module of Fig. A) are routed to a 
multiplexer 7808, along with the input data point from which all interpolated 

20 samples are offset. The multiplexer 7808 simply selects these samples, in 

sequence, and provides them to the output at the expanded data rate Lxr. 

A major advantage of the system 7800 is that all of the system's 
components are operated at the (slow) input data rate except the output 
multiplexer 7808. If desired, it would also be possible to employ fewer Angle- 

25 Rotation Modules 7806, but operating them at a higher data rate, and using 

several \x values, sequentially, for each. This would result in a system that 
employed less hardware but one that traded off the hardware savings for a higher 
data rate operation of such modules. 
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2. 8 Trigonometric Interpolator Summary 

In this Section we have described an interpolation method that we have 
devised that uses trigonometric series for interpolation. Comparing the 
interpolations using the trigonometric polynomial and the Lagrange polynomial of 

5 the same degree, the trigonometric-based method achieves higher interpolation 

accuracy while simultaneously reducing the computation time and the amount of 
required hardware. Moreover, the trigonometric-based method preforms 
operations that are similar to that of a phase rotator for carrier phase recovery. 
This allows a reduction in the overall synchronization circuit complexity by sharing 

10 resources. 

This trigonometric interpolator yields less computational delay, as 
compared to algebraic interpolators. To achieve the same throughput rate, this 
translates into more savings in hardware using the proposed structure, because the 
data registers that are required by algebraic interpolators to pipeline the 

1 5 computation for a faster rate would not be needed by our structure. 

We have also introduced two implementations of the trigonometric 
interpolation method: one using a lookup table, and one using an angle rotation 
processor (to be discussed in Section 5). 

After introducing a first interpolation method, we have shown that we can 

20 trade one angle rotator for a multiplier by conceptually modifying the input 

samples, then by "correcting" the interpolated value obtained from the "modified" 
samples. Through this modification, we have obtained a simpler implementation 
structure while simultaneously improving the performance when interpolating 
most practical signals. This performance improvement has been demonstrated by 

25 comparing the frequency responses of the interpolators and the mean-squared 

interpolation errors using these interpolators. Our discussion of the optimal digital 
resampler in Section 4 will be based on this simplified interpolator. 



1904.0140003 



-43- 



3. Interpolation Filters with Arbitrary Frequency Response 

In Section 2, an interpolation method using a trigonometric polynomial 
was introduced, along with an example of such an interpolation structure of length 
N=4. In addition to being a very simple structure, our analyses and simulations 
5 also demonstrated that the trigonometric interpolator outperformed the 

interpolator of the same length using a Lagrange polynomial. In this Section, a 
more systematic approach will be taken to analyze this method from the digital 
signal processing point of view. A formula giving its impulse response allows us 
to analyze the frequency response of the interpolation filter. We then show how 
10 to modify the algorithm to achieve arbitrary frequency response. The discussions 

in this Section will provide the framework for the optimal interpolator of 
Section 4. 

3. 1 Formulating the Trigonometric Interpolator as an 
Interpolation Filter 

15 We have shown that, given N equally-spaced samples y(n) y a 

continuous-time signal can be reconstructed as 

Nil 

y(t)= Z y{n)f{t-n) (3 i) 

where f(t) is the impulse response of a continuous-time interpolation filter. As in 
Section 2, it is assumed that the sampling period is 7^=1 . This assumption makes 
20 the notation simpler and the results can easily be extended for an arbitrary T s . In 

other words, the invention is not limited to a sampling period of 7^=1, as other 
sampling periods could be utilized. In Section 2 we have shown that f(t) can be 
expressed as: 
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W In 
1+2 2^ cos— kt + cosrt \t\<N/2 



(3-2) 



0 



\t\> Nil. 



\ 



FIG. 20 illustrates /for the trigonometric interpolation filters for N=S and #=16. 
By computing the Fourier transform of /, we obtain the frequency response of the 
interpolation filter. The frequency responses for the N=S and N=16 cases are 
plotted in FIG. 21. Since f(t) is real and symmetric around t=0, its frequency 
response has zero phase. In FIG. 21, the oscillatory behavior near the band edge 
is quite obvious. In addition, by comparing FIGs. 21a and b, we can see that as the 
filter length is increased from N=% to N=16 the amount of ripple does not 
decrease. Well known as the Gibbs phenomenon, the magnitude of the ripples 
does not diminish as the duration of the impulse response is increased. 

It is apparent that the amount of oscillation cannot be reduced using the 
method discussed thus far while only increasing the filter length N. Moreover, it 
seems that an arbitrary frequency response cannot be achieved using this method. 
To address these problems, let us examine how the frequency response of this 
method is determined. 

3.2 Analysis of the Frequency Response 

Let us examine how the frequency response of / for the trigonometric 

interpolator is obtained using the example in FIGs. 22 with N=S. According to 

N N 

(3.2), the interpolation filter's impulse response f(t) on the interval - — < t < — 

^* ^* 

is a weighted sum of cosine functions. We can view the finite-length filter /in (3 .2) 
as having been obtained by applying a window 2206 according to the following: 




\t\< Nil 
\t\> Nil 



(3.3) 
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to an infinitely-long, periodic function 2204 with period N: 



W In 

/ c (0= 2j cos— &r + cos;zf - co < ? < oc 

tf/2-l* =1 i , (3.4) 

k=-N/2+l * 



such that 

/(0=/*('M0> -oo</<oo. (3.5) 
Thus F the frequency response of / can be obtained by convolving F c and fFthe 
Fourier transforms of f e and w, respectively. 

The Fourier transform of the periodic function/.^, -°° < f < °°, is 

N/2-1 f 9^7- \ 1 1 

F c (n)= X do-— *J+ -*<«-*)+ + (3.6) 

k=~N/2+l ^ iv / Z Z 

which consists of a sequence of impulses 2208. We will subsequently refer to the 
weights of these impulses as frequency samples. Denoting the weight of 



<^Q - b y F(k) , we have 



f c w= X hmn-—k\ (3.7) 

where ^ - 2 is an integer. In the case of (3.6), M-— , For our particular 
interpolation filter, according to (3.6), all in-band frequency samples F(k} = 1 
for |*| < Nil, For \k\ > Nil, the out-of-band samples F{k) = 0 . The two samples 
in the transition band are F(N/2)= F(-N/2)= 1/2. The transition 
bandwidth is determined by the distance between the last in-band, and the first 
out-of-band frequency samples. 

Since w is a rectangular function, W must be a sine function 2210. 
Convolving F c and the sine function W simply interpolates the frequency samples 
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to obtain F(Q)-vc < 0. < co , shown as response 22 12. (Here we have plotted 
the symmetric F only on the positive half of the Q axis.) We thus have 

F(k)=F(n)\ a=2 „ k/N , ~Y- k -Y- (38) 

From response 2212, the continuous-frequency response F(Q) is uniquely 
5 determined by an infinite number of equally spaced frequency samples F(k) . If we 

modify the frequency samples 2214 near the passband edge to let the transition 
between the passband and stopband be more gradual, as depicted in FIG. 23, then 
the ripple is decreased. FIG. 23 demonstrates gradually reduced samples 2302, 
and the reduction of ripples in the overall response 2304, as compared to the 
1 0 response 22 1 2 in FIG. 22 . The cost of this improvement is, an increased transition 

bandwidth in the response 2304, as compared to the response 2212. 

If a narrower transition band is desired, we can increase the duration of the 
filter f(t). This can be seen by comparing FIG. 24, where N=16, with response 
2304 in FIG 23, in which N=S. 

15 3.3 Implementing the Modified Algorithm 

By comparing (3.6) and (3 .4) we can see that the frequency sample values, 
i.e., the weights of the impulses in (3.6), are determined by the weights in the sum 
in (3.4). 

We can modify our original interpolation filter in (3.2) for |/| < N/2 as 

20 /„(0=^(0)+2£ F(k)cos—kt. (3.9) 

k=\ ^ 

By expressing (3.9) using the W N notation, for |f| < N/2 we have 

M M 

f m (t)=F(0)W° t + Y d F{k)(W* + W; kt )= X F{k)W- H . (3.10) 

k=l k=-M 

and/ m fy=0 for |f|>j\f/2. Substituting this result into (3.1), and re-ordering terms, 
we have 
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X0=T7 E IK* (3.11) 



where 



By comparing (3.12) to (2.9), we can see that, for k = - — + 1, . . . , — . 



(3.12) 



C k = F(k)c k . (3.13) 
Thus, a modified algorithm can be implemented as the following steps: 

Step 1 '. (Same as Step 1 in Section 2.4): Given an even number of samples 
N 9 calculate the Fourier coefficients c h k=0,.., y N/2 using (2.9). 

Step 2\ Multiply the coefficients c^by scale factors F(k) using (3.13). 

Step 3': Given a fractional delay compute the synchronized samples 
using (3.11), which, due to c* k = c_ k , can be simplified as: 



f M ^ 



if M 



(3.14) 



It seems that, in Step 3 1 , we need coefficients c{k) (hence c k ) not only for k<N/2 

but also for k>N/2 while only c k values for k<N/2 are computed in Step l 1 . 
However, c k values for k>N/2 can be obtained using 

Ck = C k-mN ( 3l5 > 

where m is an integer such that 0<k-mN<N/2. We have (3.15) because c k is 
periodic in k with period N, because c k is obtained from the Fourier transform of 
the discrete-time signal y(n\ -A72+l<«<cA72. 

At this point, we have shown that the continuous-time frequency response 
of the interpolation filter having impulse response f(t) can be improved by 

modifying the weights F(k) in (3.10). Now a question arises: the modification of 
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the weights would alter the shape of the impulse response of the f(t) filter. How 
do we guarantee that the resulting filter does not change the original samples? 

J. 4 Conditions for Zero ISI 

In order for f(t) not to alter the original samples when used for 
5 interpolation as in (3.1), it must have zero-crossings at integer multiples of the 

sampling period: 

fl 0 

/(„)=<! . (3.16) 

J v J [o 0, wan integer. v J 

The well-known Nyquist condition for zero ISI (Proakis, J.G., Digital 

Communications , McGraw-Hill, New York, NY (1993)) states that the necessary 

1 0 and sufficient condition for (3.16) is 

X F(fl-2*n)=l -oo < Q < oo. (3.17) 

n=-co 

Since the filter's impulse response f(t) has a finite duration, le.f(t)=0 for \t\>N/2 9 
(3.16) holds if and only if the frequency samples F(k) satisfy 

00 

X F(k - Nn)=l, k= integer. (3.18) 

n—-<x> 

1 5 The proof is given in Appendix A. 

In summary, we can still guarantee that the modified interpolation filter 
/ does not alter the original samples as long as the modified weights F(k) 
(frequency samples) satisfy (3.18). Using this constraint, we can design the 
weights F(k) to meet given frequency response requirements. 

20 3. 5 Optimization Algorithm 

Using the approach discussed, one can approximate an arbitrary frequency 
response by choosing appropriate weights F(k) . For example, a desired frequency 
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response F d (Q ) for an interpolation filter should be unity in the passband and 
be zero in the stopband, as 



n 71 

0 |Q|>^. 



(3.21) 



The interpolation error using our interpolation filter is defined as 

e(Q)= W t (Q)\F d (Cl)- F(Cl)\ (3.20) 
where W,(Q) is a weighting function. 

From Section 3.2 we have F(Q)= F C (Q)® sinc(£l). Thus, we can 

express F(Q) in terms of F(k) , using (3.7), as 

f m ( In \\ 

F(.n)=\ X F(k)S[n -— k\\® sinc(Q) 

M ( 2x \ 

= X F(A:)sincl Q-— k . 

An optimal interpolation filter can be designed by choosing p(k) to 
minimize the peak interpolation error, as 

Ac = max{e(Q)} (3 22) 

or the mean-squared interpolation error 

L 2 = f" e 2 (Q>/Q (3.23) 

J— GO 

subject to the constraint described by (3.18). 

By examining FIGS. 23 and 24, we can see that, by modifying only two 
frequency samples, those nearest the band edge, a significant improvement is 
achieved. In these cases we haveF(Ar) = 0 for \k\>N/2+l. 
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3.6 Conclusion 

In this Section, an interpolation method was presented that achieves 
arbitrary frequency response by modifying the trigonometric interpolator discussed 
in Section 2. Using this approach, the performance of a trigonometric interpolation 
filter can be further improved. 

It is interesting to note that this procedure is equivalent to the well-known 
filter design method using windows. FIG. 25a depicts the impulse responses of the 
original filter (3.2) as the dashed line, and the modified filter (3.9) as the solid line. 
By comparing the two impulse responses, we have found a function illustrated in 
FIG. 25b. If we multiply the original impulse response by this function, we get the 
impulse response that we obtained by modifying the frequency samples. Therefore, 
this function is equivalent to a window. According to this interpretation, our 
frequency domain design method is equivalent to designing a better window than 
the rectangular window (3.3) in the time domain. 

4. Design of Optimal Resamplers 

4 A Motivation 

We have thus far discussed digital resampling using interpolation methods. 
To accomplish this, we conceptually reconstruct the continuous-time signal by 
fitting a trigonometric polynomial to the existing samples and then re-sample the 
reconstructed signal by evaluating this polynomial for a given sampling mismatch 
(or offset) [i. The reconstruction of the continuous-time bandlimited signal y(t) 
from existing samples y(m) using interpolation filter j{t\ according to (3.1), is 

N/2 

y{t)= £ y{m)f{t-m). (4.1) 
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Then^ is resampled at t = \i as 

m=- A/72+1 

where/^m) =fim - 

In the previous sections we approached the problem only from the point 
of view of reconstructing the continuous-time signal, as in (4.1), since we have 
only examined the frequency response of the continuous-time filter/(0 . However, 
what we actually are interested in is the new sample y(\x) that is obtained by 
resampling the continuous-time signal at a new sampling instant, t= ji. 

Now, a question arises: Even when the frequency response F(Q) of the 
continuous-time filter is optimized as in Section 3.5, do we necessarily obtain the 
minimum error in producing a new sample y(p) for each \i value? 

According to (4.2), the new sample y(p) is actually obtained by filtering 
the original samples Xtw) using a discrete-time filter whose impulse response/ M (w) 
depends on a particular delay |i. What is the desired frequency response of the 
fjm) filter? 

A digital resampler that is used in timing recovery simply compensates for 
the timing offset in sampling the received signal. Ideally, the/^m) filter should 
not alter the signal spectrum as it simply delays the existing samples by \i. 
Obviously, the desired frequency response of the discrete-time/^/w) filter is 

F ( ,(a>,ti) = e" (4.3) 

where o> is the normalized angular frequency. Let us define the frequency 
response of fjjri) as F^(co). The error in approximating the ideal frequency 
response F d (o), \i) by /^(co) for a given \x value, is 

e{(D) = W t (t»)\F d («>,Ju) ~ F m (cd)\ (4.4) 
where W t (u>) is a weighting function. 

We now examine how the discrete-time fractional-delay frequency 
response F^iS) is obtained. We denote by F(Q) the Fourier transform of the 
continuous-time filter XO- Hence, the Fourier transform of f{t-\\) must be 
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e^FiQ). We know that fjri) =fin-\i) are just samples of fit-\i), where -« < t 
< «. Therefore, according to the sampling theorem (Proakis, J.G., Digital Signal 
Processing, Macmillan, New York, NY (1992)), the Fourier transform of/^(w) is 

= Z e- y(Q - 2 ^^F(Q - 2**) (4.5) 

5 after we replace Q on the right-hand-side expression by the normalized angular 

frequency variable co (to = Q since 7^=1). This relationship is shown in FIG. 26, 
where F(Q) corresponds to the AT = 8 interpolator of FIG. 11. As discussed in 
Section 3.2 9 J(t) is symmetric around t = 0. This implies that F(Q) has zero phase. 
To make the/(0 fil ter physically realizable, of course, we must introduce a delay 

10 of N I 2, where N corresponds to the length of the filter. However, this delay 

simply "shifts" all input samples by Af / 2, which is an integer because N is even, 
and it does not change the characteristic of the input signal. Thus, it does not 
influence the interpolation accuracy. Therefore, to simplify our notation, we just 
use F(Q) as a real function. Hence, the phase of the complex function e'^i^Q) 

15 is -Qji if F(Q) > 0, or -Q^i + n if F(Q) < 0 —the phase depends on \i. FIG. 26 

shows that the frequency response of the discrete-time filter i^(o>) is obtained by 
first making an infinite number of copies of e^F(Q) by shifting this fiinction 
uniformly in successive amounts of 2n, then by adding these shifted versions to 
e'^F(Q). As a sum of complex functions, the shape of FJ^o) depends not only 

20 on the shape of the continuous-time frequency response F(Q) but also on the value 

\i. The dependence of F^(o>) on \i is illustrated in FIG. 27, where F^w) is 
obtained from the function F(Q) in FIG. 26, using |x = 0.12 and \i = 0.5. The 
magnitude of the ideal fractional-delay frequency response, defined in (4.3), is 
shown in both FIG. 27A and FIG. 27B as the dashed lines. It is evident that the 

25 frequency response 2706 is worse for \x = 0.5 than the frequency response 2704 

is for |i = 0.12, since the response 2706 deviates more from the ideal frequency 
response 2702 than does the frequency response 2704. Hence, the interpolation 
error is larger for \i = 0. 5 than for \i = 0. 12. We have observed in our simulations 
that the largest interpolation error occurs when \x = 0.5, i.e., when the interpolated 
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sample is exactly in the middle of the two nearest existing samples. As \i 
approaches 0 or 1 (i.e., as the interpolated sample gets closer to an existing 
sample), the interpolation error becomes smaller. Moreover, the interpolation 
errors obtained for \i and 1 - \i are the same, 

4.2 Resampler Optimizations 

In Section 3, we analyzed the relationship between the weights p (k) in 
(3 .9) and the frequency response of the interpolator. We have shown that we can 
enhance F(Q) by adjusting the p (k) values. In FIG. 23 (N= 8), for example, the 
p (3) and p (5) values correspond to the magnitude of the pulses 2302 near the 
band edge. If we adjust p (3) and p (5) such that the transition between the 
passband and stopband is more gradual, we can achieve a better frequency 
response. 

To further improve the interpolation performance, we could take \i into 
account, by optimizing for each n value. As in Section 3, we could adjust 
F(k) near the band edge to change the shape of F(Q), for each \i value, such that 
the discrete-time frequency response F m (g>), which is obtained from (4.5), best 
approximates the desired response of (4.3). 

As discussed in Section 3, to guarantee that the original samples are not 
altered using the modified interpolator, the weights P(k) should satisfy (3.18). 
WhenJV=8, for example, we modify F(3) and F(5) together in order to satisfy 
(3.18). Here, however, our goal of optimization is to obtain the best 
approximation for the sample corresponding to a specific \i. Hence we need not 
be concerned with the issue of altering the original samples in Section 3, where 
there is only one set of optimized weights for all \i values. 

Let us demonstrate this method using the example of iV=8. We chose to 

modify F(3) and F(4) . For each given \i 9 we search for the F(3) and F(4) 
values such that (4.4) is minimized. We denote such F(3) and F(4) values by F M (3) 
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and F M (4) respectively, since they are now also dependent on \i. FIGs. 28 A and 

B show the modifications to F(Q) for ^=0.12 and \i=0.5, respectively. The 
corresponding optimized FJiS) functions are illustrated in FIGs. 28C and 28D, 
respectively. 

To demonstrate the performance improvement, let us use this example: 
Given jx=0.5, we optimize F^iS) forthe signal whose spectrum is shown in dashed 
lines 2902 in FIG. 29. Comparing the un-optimized frequency response 2904 with 
the optimized frequency response 2906, the modification clearly produces a better 
frequency response. More specifically, the response 2906 is flat in the frequency 
band where the power of the signal 2902 is concentrated, and the deviation from 
the ideal response mostly falls in the "don't care" band. 

Similar to Section 3 , to implement this improved method, we first compute 
the coefficients c k from the existing samples as in (2.9). Then, given the \i value, 
we multiply, e.g., for 7V=8, the c 3 and c 4 values by ^(3) and ^(4) , respectively. 
Finally, we compute the synchronized sample y(^) using (2. 1 1), where c 3 and c 4 
are replaced by ^(3) c 3 and F M (4)c 4y respectively. 

We can apply similar modifications to the interpolator with N=4. FIG. 
30A show the frequency response of the interpolator 1000, for fi=0.5, while 
FIG. 3 0B displays the results of a modified interpolator 1000, where parameters 

F(l) and F(2) are optimized, for n=0. 5, to maximize the interpolation accuracy 

for the signal whose spectrum is shown in dashed lines. As can be seen the 
optimized response 3006 is flatter in the part of the spectrum of the signal 3002 
where most of its energy is concentrated than is the un-optimized response 3004. 

The flowchart 3400 in FIG. 34 generalizes the optimization of the 
trigonometric optimization procedure. The flowchart 3400 is similar to the 
flowchart 1700, but includes the additional steps of 3402 and 3404 that are 
described as follows. 
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In step 3402, a factor F ^ is determined to adjust the frequency response 
of the trigonometric interpolator so that it is consistent with the frequency 
response of N-data samples and the offset \i. 

In step 3404, one or more of the complex coefficients are multiplied by the 
5 Fp to modify the frequency response of the interpolator so that it is consistent 

with the input signal and the offset \i. 

The optimization routine can also be used with K-modified data samples 
that leads to the simplified interpolator structures of FIGs. 14 and 15. The 
flowchart 3500 illustrates the F^ factor modification in the context of the 
10 flowchart 1900. 

As will be shown in the section that follows, the steps 3402, 3404, and 
1708 can be combined into a single step if a table lookup is used to determine the 
rotation factor. In other words, the sine and cosine values can be multiplied by the 
F M factor before they are stored in the ROM. 
1 5 In Section 2. 6, we have presented an efficient algorithm that eliminates one 

angle-rotation. For example, for N=4, we can "modify" the input samples 
according to (2.22). With this modification, we can treat the new samples as if the 
input signal satisfies c 2 =0. The remaining non-zero coefficients are c 0 and c v In the 
example for N=4 in the previous Section, two parameters, F M (1) and F p (2) , are 

20 optimized to achieve the best approximation of a desired fractional-delay 

frequency response described by (4.3). Now, with c 2 =0, we have only one 
parameter, F^ (1) , to choose. 

The impulse response of the simplified interpolation filter is derived in 
Appendix B . From the mathematical expression of the impulse response (B . 5), we 
25 can obtain the corresponding frequency response. The frequency responses of the 

interpolator 1400 (FIG. 14) before and after applying the F M (1) modification are 

shown in FIG. 3 1 A-B, respectively. We can see an improved frequency response 
3106 over the response 3104, as the response 3106 is flatter in the part of the 
signal 3102 where its energy is concentrated. Furthermore, it seems that the 
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frequency response 3106, where only c x is modified (c 2 =0!), is as good as the 
modified response 3006 in FIG. 3 OB where both c x and c 2 are modified. 

4.3 Implementations 

It may appear that additional hardware is needed to implement the 
multiplication by, for example, ^0) for the simplified N=4 structure. Let us 

examine the corresponding computations. As we know, we first compute 
coefficients c 0 and c x according to (2,25) (c 2 =0, of course). We then compute y(p) 
using 

y(M) = ^(F M (l)c/^)- Kju (4.6) 

according to (2.26) and (2.27), where K is defined in (2.24). As discussed in 
Section 2.4, the computation 

Re(F,(l) Cl e^) = Re(cO(Re(F, (1)/ V Im^ImC^)^) (4 .7) 

/v J — fi 

can be accomplished by retrieving the F (V)e 2 value from a ROM lookup 

table and then multiplying RefcJ+j Imfcj) by the retrieved value, since both 
F M (1) and e J 2 M can be pre-determined for all \i values. 

If the angle-rotation method is used, we can use a lookup table to store the 
FpQ) values. In real-time, after computing c^e^ using an angle-rotation 

processor, we can multiply the result by F M (1) . In this case, if F M (1) is allowed 
to be a complex number in optimizing performance, we then need two real 
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multiplications to obtain Re( F^fyc^ 2 ) in (4.6). However, if we restrict 
F M 0) to be a real number, we can use just one real multiplication as 

Re(F M (l)c/> M ) = Re(c/h (48) 

According to Table 4. 1, the NMSE using complex and real F M (1) values 

are -37.41 dB and -37.08 dB, respectively. Therefore, the performance 

degradation caused by restricting F M (1) to be a real number is insignificant. 

When the table-lookup method is employed, the implementation structure 
for the optimal interpolator is the same as that for the interpolator 1400, except 
for the coefficient c t which is now multiplied by F M (Y)e 2 instead of e J 2 M . 

^ j — fj, 

The table should therefore contain the Re(i^(l)e 2 ) and 

Im(i^(l)e 2 ) values, rather than the sin—// and cos—// values used by 

2 2 

the interpolator 1400. We now show that the size of the table is the same as the 
one storing the sine and cosine values. 

Let us examine the contents of the lookup table. FIG. 32 displays the 

Re(7^(l)e 2 ) and Im(i^(l)e 2 ) values, used by (4.7), where the real 
values are represented by curve 3202, and the imaginary values are represented 
by the curve 3204, These values are monotonic with respect to \i, just like the 

.71 K 

sin- ju and cos— - M values for 0< \i< 1 . Moreover, simulations show that, when 
optimal values of F (I) are reached, the real and imaginary components of 

^ j — jU 

F^ (Y)e 2 display the following complementary relationship: 

ImCF^l)^) = ReC^a)* 7 ^). (4-9) 
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^ j — M 

Therefore, we need only store one of the Re(F^(l)e 2 ) and 
Im( F (l)e J * M ) values. The other can be obtained by looking-up the value 
corresponding to 1 - ji. This is the same technique used in storing and retrieving the 
sine and cosine values with the purpose of reducing the table size. 

Various circuit implementations of optimized interpolators having N=4 are 
illustrated in FIGs. 36-37. These circuit implementations are presented for 
example purposes only and are not meant to be limiting, as those skilled in the arts 
will recognize other circuit implementation based on the discussion given herein, 
including interpolator configurations having different N values. 

FIG. 36 illustrates an optimized interpolator 3600 that is based on the 
simplified interpolator 1400 ( FIG. 14). The interpolator 3600 includes an 

F M ROM 3602 and a multiplier 3604. The ROM 3600 stores the appropriate 
F M value indexed by \i. The multiplier 3604 multiples the complex coefficient C x 

by the appropriate F M value, and therefore optimizes the frequency response of 

the interpolator 3600. As discussed above, the order of the angle rotator 1010b 
and the multiplier 3604 can be interchanged so that the rotated complex 

coefficient is modified by the F M value. 

FIG. 37 illustrates an optimized interpolator 3700 that is similar to the 

optimized interpolator 3600, except that the ROM 3602, the multiplier 

3604, and the angle rotator 1010b are combined into to a single ROM 3702, that 
stores the Re(F M (l)e * M ) and Tm(F M (l)e J ) values. Therefore, coefficient 
optimization and angle rotation are performed in a simultaneous and efficient 
manner. 

It will be apparent to those skilled in the arts that the combined F M and 

angle rotator ROM 3702 can be implemented for interpolator configurations that 
include more than N=4 elements, based on the discussions herein. 
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4. 4 Simulation Results 

We have verified the new design with the following simulation. Abaseband 
signal, shown in FIG. 33, with raised cosine spectrum, two samples per symbol 
and 40% excess bandwidth was generated. Table 4. 1 compares the result for N=4 
using four interpolation structures: 1) the Lagrange cubic interpolator, 2) the 
interpolator 1000, 3) the interpolator 1400, 4) the optimal resampler using a 
complex F M (1) value, and 5) the optimal resampler employing a real F M (1) 
value. 

Using the optimal structure, the NMSE is reduced by 4 dB over the 
method without optimization (FIG. 14 structure). The performance is improved 
by more than 6 dB compared to the Lagrange cubic interpolator, while the 
hardware is reduced. Comparing the optimal structure to the FIG. 14 structure, 
a 4 dB performance gain was obtained without increasing the amount of hardware. 



Table 4.1 Comparison of interpolators for iV=4. 



N=4 


Lagr. 
cubic 
struct. 


Struct, in 
FIG. 10 


Struct, in 
FIG. 14 


Optimal 
table- 
lookup 8 


Optimal 
angle- 
rotat. b 


NMSE in dB 


-31.08 


-31.21 


-33.51 


-37.41 


-37.08 


Scaling multipliers 


2 


0 


0 


0 


0 


Data multipliers 


3 


3 


3 


3 


4 


Multipliers in critical path 


4 


2 


2 


2 


3 



A .It 

a. Complex F M (1) values are used. The table stores the F (l)e J ^ values. 



b. Real F M (Y) values are used. The output of the angle-rotator Re [ Cli ^ 2 
multiplied by F M (1) . Thus, one more real multiplexer is needed. 
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The frequency response of an optimized interpolator 1400 (FIG. 14) using 
a lookup table is shown in FIG. 16D. Also shown in FIG. 16A-D are the 
frequency responses of the Lagrange cubic interpolator 400, the interpolator 1 000, 
and the interpolator 1400 without optimization. The signal spectrum of FIG. 33 
is shown in FIG. 16D in dashed lines 1604. The interpolation error corresponds 
to the gray area 1606. From FIG. 16D, the performance improvement achieved 
by the optimal interpolator is evident because the gray area 1606d is a lower 
amplitude than the corresponding gray areas 1606a-c. In addition, these 
improvements are accomplished without increasing the amount of hardware. 

For a high-performance interpolator, we now turn to the structure 
described in Section 2.6.2, for 7V==8. Applying a similar approach for N=4 y as just 
discussed, to the N=S interpolator of Section 2.6.2, we can multiply the c 3 

coefficient by F M (3) 9 whose value optimizes the frequency response i^(a>) of a 
fractional-delay filter with delay jlx. 

In designing the proposed N=& interpolator, only one parameter F M (3) 

was adjusted to minimize the MSE in (3.22). 

Next, three interpolators of length N=8 were used to interpolate the signal 
in FIG. 33: 1 ) a Lagrange polynomial interpolator, 2) a Vesma-Saramaki optimal 
polynomial interpolator (Vesma, J., and Saramaki, T., "Interpolation filters with 
arbitrary frequency response for all-digital receivers," in Proc. 1996 IEEE Int. 
Symp. Circuits Syst (May 1996), pp. 568-571) (with length 8 but a third degree 
polynomial) and, 3) the proposed interpolator. 

Table 4.2 shows the simulation results. These results demonstrate that our 
method has an NMSE more than 1 6 dB lower than the Lagrange interpolator, and 
more than 4 dB lower than the Vesma-Saramaki polynomial interpolator in 
(Vesma, J., and Saramaki, T., "Interpolation filters with arbitrary frequency 
response for all-digital receivers," in Proc. 1996 IEEE Int. Symp. Circuits Syst. 
(May 1996), pp. 568-571). 
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Table 4.2 Performance comparison. 





Lagrange 
interpolator 


Vesma-Saramaki 
interpolator 


Proposed 
interpolator 


NMSEindB 


-45.29 


-57.34 


-62.17 


Scaling multipliers 


25 


16 


2 


Multipliers 


7 


3 


7 



4.5 Conclusion 



Instead of optimizing F(Q), the frequency response of the continuous-time 
interpolation filter, we could optimize F^co) of the fractional-delay filter for each 
\x value. By doing this, better interpolation performance can be achieved, as 

1 0 demonstrated by the simulations . 

As for the implementation complexity, when a table-lookup method is 
employed, the optimal interpolator does not require additional hardware, just table 
values that implement the coefficient optimization and angle rotation. When the 
angle rotation method is used, one additional real multiplier is needed. 

1 5 For N=4, the optimal interpolator attained a 6 dB lower NMSE than the 

Lagrange cubic interpolator, while requiring less hardware. 



J. A High-Speed Angle Rotation Processor 



In previous Sections, an interpolation method and apparatus for timing 
20 recovery using a trigonometric polynomial has been discussed. The major 

computation in this method is the angle rotation, such as angle rotator 1010 (in 
FIG. 10 and FIG. 14). As mentioned in Section 2.4, these operations, together 
with the phase rotator for carrier recovery, can be implemented by table-lookup 
in a ROM containing pre-computed sine and cosine values, followed by four real 
25 multipliers to perform the angle rotation (see FIG. 1 8). Herein, going forward, this 

approach will be referred to as the single-stage angle rotation. Although fast 
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angle rotation can be achieved with efficient multiplier design techniques, for 
practical precision requirements, the ROM table can be quite large. For 
applications where low complexity and low power are the major concern, can we 
further reduce the amount of hardware for angle rotation with slightly more 
computational delay? 

There are various hardware designs that accomplish angle rotations, 
notably the CORDIC processors (Ahn, Y., et aL, "VLSI design of a CORDIC- 
based derotator in Proc. 1998 IEEE Int. Symp. Circuits Syst. t Vol II (May 
1998), pp. 449-452; Wang, S., et al, "Hybrid CORDIC algorithms," IEEE Trans. 
Comp. 46:1202-1207 (1997)), and, recently, an angle-rotation processor 
(Madisetti, A. , et al , "A 1 00-MHz, 1 6-b, direct digital frequency synthesizer with 
a 100-dBc spurious-free dynamic range," IEEE J. Solid-State Circuits 34:1034- 
1043 (1999)). These algorithms accomplish the rotation through a sequence of 
subrotations, with the input to each subrotation stage depending on the output of 
the previous stage. In these cases, the latency is proportional to the precision of 
the angle. 

We now propose a different approach for angle rotation. Here the rotation 
is partitioned into just two cascaded rotation stages: a coarse rotation and a fine 
rotation. The two specific amounts of rotation are obtained directly from the 
original angle without performing iterations as does CORDIC. The critical path 
is therefore made significantly shorter than that of the CORDIC-type methods. In 
addition, only a small lookup table is needed. 

In this Section, methods and apparatus for two-stage angle rotation will 
be described. These method and apparatus are meant for example purposes only, 
and are not meant to be limiting. Those skilled in the arts will recognize other 
methods and apparatus for two stage angle rotation based on the discussion given 
herein. These other methods and apparatus for angle rotation are within the scope 
and spirit of the present invention. 

It will be shown that more precision and less hardware can be obtained 
using the two stage angle rotator compared to the single-stage angle rotator, with 
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slightly more computational delay. We will then show that, given an overall output 
precision requirement, various simplifications can be applied to the computations 
within the two stages to reduce the total hardware. 

5. 1 The angle rotation problem 

5 If we rotate a point in the X-Y plane having coordinates (X& Y 0 ) 

counterclockwise, around the origin, by the angle 4>, a new point having 
coordinates (X, Y) is obtained. It is related to the original point (Xo, Y Q ) as: 

X=X 0 cos§ -F 0 sin<() 

7= 7 0 cos<f> +X 0 sincj> (5.1) 
10 51.1 Single-Stage Angle Rotation 

The operation in (5.1) is found in many communication applications, 
notably in digital mixers which translate a baseband signal to some intermediate 
frequency and vice versa. In addition to accomplishing (5.1) with CORDIC, a 
very common implementation is to store pre-computed sine/cosine values in a 

15 ROM (Tan, L. and Samueli, H., "A 200-MHz quadrature frequency 

synthesizer/mixer in 0.8-jam CMOS," IEEE 7. Solid-State Circuits 50:193-200 
(1995)). Then, in real-time, the computation in (5. 1) is accomplished with a ROM 
access for each given <f> followed by four real multiplications. This method avoids 
the excessive latency of the iterations performed by CORDIC and can yield lower 

20 latency than the angle-rotation method (Madisetti, A. , "VLSI architectures and IC 

implementations for bandwidth efficient communications," Ph.D. dissertation, 
University of California, Los Angeles (1996)). Furthermore, a very fast circuit can 
be built, based on efficient multiplier design techniques. However, since the size 
of the ROM grows exponentially with the precision of <j>, a rather large ROM is 

25 required to achieve accurate results. 

ROM compression can be achieved by exploiting the quarter-wave 
symmetry of the sine/cosine functions and such trigonometric identities as 
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sin0 = cos(tu/2-0). The angle cj) in the full range [0,2:t] can be mapped into an 
angle 0e[O,7i/4]. This is accomplished by conditionally interchanging the input 
values and X 0 and Y Q , and conditionally interchanging and negating the output X 
and Y values (Madisetti, A., "VLSI architectures and IC implementations for 
bandwidth efficient communications/' Ph.D. dissertation, University of California, 
Los Angeles (1996)). Thus, we will focus only on 0e[O,7i;/4] and replace cj> by 0 

in (5.1). Defining 8 = {% 1 4)0" , we must have 0~ e [0,1] . 

Obviously, the sine/cosine ROM samples must be quantized because of the 
limited storage space for sine/cosine samples. This produces an error in the ROM 
output when compared to the true (unquantized) sine/cosine value, which will 
subsequently be referred to as the ROM quantization error. Next we examine how 
this error affects the output. Let cos 0 and sin 0 be quantized to N bits, to 
become [cos0] and [sin0], respectively. We have 



where A ^ and A^q are the ROM quantization errors, which satisfy | A cos0 1 < 2" N 
and | A gin9 1 < 2~ N , The error in Xdue to the ROM quantization is the difference 

between ^calculated using infinite-precision sine/cosine values and the quantized 
values, that is 



cosG = [cos0]+ A 
sinG = [sin0]+ A s 



sinO 



(5.2) 




(5.3) 



Its upper bound is 



|A*|<(|x 0 Mr 0 |)2 



-N 



(5.4) 
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5. 1. 2 Rotation by a Small Angle 



If the rotation angle happens to be so small that 

|e|<2"* /3 (5.5) 
then its sine/cosine values can be approximated as 

5 sin0*0 (5.6) 

cos9 « l-(9 2 /2). (5.7) 
For such 0 no table is needed. Next, we show how accurate (5 . 6) and (5 . 7) are by 
estimating their approximation errors. 

The Taylor expansion of sin0 near 0=0 yields 

sin r L 

10 sin0 = e-—^0 3 (5.8) 

6 

where £ = hd, 0 < h < 1 . Thus, since 

|sin'^| = |cos^|< 1 (5.9) 
and in view of (5.5), an error bound on (5.6) is 

|A sine |=|sin0 - 9|<|9 3 1 6\< 2~ N 1 6. (5.10) 
15 Similarly, the Taylor expansion of cos0 yields 

cos0 = l-|e 2 + ^G 4 . (5.11) 

Thus, an error bound on (5.7) is 

|A oose l=|cose-(l-0 2 /2)|<|e 4 /24| (5.12) 
which is negligible in comparison to the bound on |A sine |. 



20 



5.1.3 Partitioning into Coarse and Fine Rotations 
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While it is unlikely that (5.5) is satisfied for a given 0e[O,TC/4], i.e., 
0 e [0,1] , if we let B>N/3 be the number of bits in 9 , then we can express 

0 = Q M + Q L where 

B M ^d l 2' l +...+d Ni ^ Niz (5.13) 
5 0 L = d N/M 2' Nf3 - l +...d B 2- B (5.14) 

with <i^{0,l}. Next we define Q M = (% /4)0 M and Q L = (% /4)Q L . Clearly, 

with 7i/4<l and from (5.14), 6 £ satisfies (5.5). 

If we substitute 6 = 0^ + Q L for <j> in (5. 1) and expand cos(8 M + Q L ) and 
10 sin(8^ + Q L \ we obtain: 

X = X l cos0 L ~Y l sin0 L 

Z^cos^+^sin^ (515) 

and 

Xi = X 0 cos 6^ - Y 0 sinOj^ 

Y, = Y 0 cos0 M + X 0 sin^ . (5 16) 

Now the rotation (5. 1) is decomposed into two stages: a coarse rotation 
15 (5 . 1 6) by Q M followed by a fine rotation (5.15) by 0 L . With this partitioning (5.5) 

and (5.6) can be applied to the fine stage: 

X=X l (l-Q 2 L /2)-Y 1 Q L 
F=7 1 (l-0i/2)+X 1 0 i . C517J 

A benefit of this partitioning is that the functions cos0 M and sin0 M in (5 . 1 6) 
depend only on the N/3 most significant bits of the angle 0 , where 

20 0 = (ti / 4)0 . They can be stored in a small lookup table. This results in a 

significant ROM size reduction. However, the approximation (5.6) introduces 
additional error. We now seek to achieve an overall precision comparable to that 
in the implementation having one stage and a large ROM table. 
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Defining the approximation errors A sin9 ^ = sin 9 L - 6 L and 
A cose^ = cos 9 £ - (1 - 6 i / 2) , and neglecting terms that are products of error 

terms or products of an error term and sin6 £ , which is always small, we calculate 
the total error in^fas the difference between X calculated using (5.15) and (5.16) 
5 and X calculated using quantized sin6 M and cosO^, and in (5.16) and using 

(5. 17) instead of (5. 15). We obtain: 

k x = X*(&„9 M cos0 L + A cos ^ cos0 M - A sm0L sin0 M ) 

~ cos0 L + A cos , l sin^ - A sin ^ cos^). (518) 

Comparing this error estimate with (5.3) and (5.4) it is evident that, so 
long as the errors due to A cos6i and are sufficiently small, the error A^ 

10 in (5. 18) can be made comparable to that of (5.4) by reducing the A cos6 ^ and 

A S in8^ values, i.e., by increasing the number of bits in the sine/cosine samples 

stored in the ROM. For example, if we add one more bit to the sine/cosine 
samples, then |A COS0 J< 2~ N ~ l and |A sin9 J< 2" N ~ 1 . Therefore, from (5. 18), 

we have 

|A^|<|JT 0 |(2-^- 1 + 2" 4iV/3 /24+ 2T» / 6) 
15 +|F 0 |(2- Ar - 1 + 2' 4 " /3 /24 + 2-^/6) (5.19) 

= (|J5r 0 l+|lol)2-^(l/2+ 1/6+ (l/24)2"" /3 ) 

which is smaller than (5.4). A similar relationship can be found for A 7 . This 
demonstrates that, if we add one more bit of precision to the ROM for the coarse 
stage, we can achieve the same precision as that in the one-stage case, but with a 
significantly smaller ROM. 
20 A straightforward implementation of this method is illustrated by the angle 

rotator 3800 in FIG. 38. The angle rotator 3800 includes a ROM 3802, butterfly 
circuits 3806 and 3810, and fine adjustment circuit 3804. 
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The ROM 3 802 stores the cos Q M and sin Q M values, where d M is the most 
significant part of the input angle 0. In embodiments the input angle 6 is 
normalized and represented by a binary number, so that 0^ is the most significant 
word of the binary number, and Q L is the least significant word of the binary 
5 number. 

The first butterfly circuit 3806 multiplies the input complex number 3812 
by the (cos0^) + and the (sin 0^) + to perform a coarse rotation, where the ( ) + 
denotes that the appropriate ROM quantization errors have been added to the cos 
Q M and sin 0^, by the adders 3814. 
10 The fine adjustment circuit 3804 generates a fine adjust value (1- 0/), 

where Q L is the least significant word of the input angle 0. 

The second butterfly circuit 3810 multiples the output of circuit 3806 by 
0 L + and the fine adjustment value from circuit 3 804, to perform a fine rotation that 
results in the rotated complex number 3814. The + on the B L + denotes the an error 
1 5 value A sin0 has been added to improve the accuracy of the fine rotation. 

The three error sources A cosQAf , A sin6Af and are shown. The much 

smaller error source A ^ has been neglected. The thick line depicts the path 
along which the ROM quantization error A cosBm propagates to X. The error 
A cos9 ^ is multiplied by X 0 and then by cos0 L as it propagates along this path to 
20 become A cose ^ X 0 cos8 L when it reaches the output. This matches the error term 

in (5.18) obtained from our calculation. In subsequent discussions we will use this 
graphical approach to find the error at the output due to various error sources. 

The ROM table 3 802 in the rotator 3 800 contains many fewer sine/ cosine 
samples in comparison to the number of samples needed to implement (5.1) using 
25 a conventional (single stage) table-lookup approach. Although the approximation 

(5.6) introduces additional error, so long as that error is smaller than the 
conventional ROM quantization error, we can increase the precision of the 
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samples in our small ROM table such that, overall, precision is not sacrificed. In 
principle, we can reduce the hardware complexity significantly in one block of our 
structure, with the corresponding accuracy loss compensated by higher precision 
from another block, and at the cost of a slight increase in the complexity of that 
5 block. As a result, the complexity of the overall structure is reduced without loss 

of accuracy. We will now exploit this idea again to further reduce the 
computational complexity. 



5.2 Simplification of the Coarse Stage 



The coarse stage, according to (5.16), involves multiplications of input 
1 0 data X 0 and Y 0 by the cosO^ and sin0^ values. Writing sinO^ as the binary number 

sine jV/ =0.A 1 ...6 M3 ^ / 3 + i..- ( 52 °) 
where b n e {0,1}, we now round sin0 M upward, to obtain an (7V/3+l)-bit value 

[sin0^], as 

[sin0 M ] = 0.b v ..b m b m+l + 2< m+l \ (5.21) 
1 5 where TV represents the number of bits in the real part and the imaginary part of the 

input complex number. In other words, the real part has Ambits, and the imaginary 
part has Ambits. 

Letting 0! be the angle for which 

sind, = [sin9J (5.22) 
20 we must have 0! > 0^. Next we can compute the corresponding cos 0 X value. 

Using sin©! = [sin©^] and cos0! values, we actually rotate the point having 
coordinate (AT 0 , Y 0 ) by 0 l5 instead of Q My as 

X x = X 0 cosQ l -Y 0 smQ l 
Y x = F 0 cos0 sinG^ 

Since 0! = arcsin([sin0 M ]) and, of course, 0^= arcsin(sin 0^), applying the mean 
25 value theorem we have 

. °'~ 6m . „ = asin' 4 = -r=2— (5.24) 
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where \ satisfies sin Q M ^<, [sin 0 M ] . Since sin8 M < 1 / (V2) , according to 
(5.21) we must have 

% < [sin9 M ]< sinG M + 2"^ < ^ (5.25) 

For most applications, N>9. Thus, according to (5.25), we have ^0.7696. 
5 Applying this value to (5 .24), 

01 ~ K ~ 7^0769?* ([sine - 1 " SinK) - 1566 X 2_( " /3+1) - (5 ^ 
Because 0 < Q M <7t/4, we have, for N>12, that 

0 < 0 X <0.0978 + n/4 = 0.8832. (5.27) 

The resulting fine-stage angle is 0-0 l5 instead of Q L = 0-6^. Thus, as in 
10 (Madisetti, A., "VLSI architectures and IC implementations for bandwidth 

efficient communications," Ph.D. dissertation, University of California, Los 
Angeles (1996)), a modified fine-stage angle compensates for a simplified coarse- 
stage angle. Since sin©! = [sin0 J, by rotating by 0 l5 the (M3+l)-bit number sin©! 
decreases the number of partial products needed in computing ^sin©! and Y Q 
15 sin©! to just over a third of those needed for X 0 sin Q M and Y 0 sin 6 M . This 

simplifies the computation in (5 .23). However, if we can also reduce the multiplier 
size in computing X Q cos 0 X and 7 o cos0 l3 we can further simplify (5.23). 
Certainly, truncating the cos©! value would reduce the number of partial products 
in computing X o cos0!and 7 o cos0!. Let us truncate cos 0 X to 2N/3 bits to 
20 obtain [cos 0J. Then, 

0 < A cos9i = cosG x - [cos0 J < 2' 2N/3 (5.28) 

We now have 

X, = XJcosej- lysine! 

F^Totcosej+XoSine!. pzy ' 
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Apparently, by truncating cos0 1? smaller multipliers are needed. But the amount 
of rotation is no longer Q v We now examine the effect on 0 X of using the 
truncated value [cosOj] instead of cosOj as 

sin0 , 

0 = atan- — -h. (5.30) 

First, we determine how 6 m is different from Q x due to the truncation of cosQ v 
Letting cos0 x and [cos©!] denote specific values of a variable z we consider the 
function 

0(z)= atan L . (5.31) 

z 

Hence, Q x and 0 m are the 0(z) values corresponding to z x = cos©! and 
z 2 = [cosOx], i.e., 0 X = 0(z x ) and 8 m — 0(z 2 ) . According to the mean value 
theorem, we have 

0(z 1 )-0(z,) 

V X) V -=0'(O (5.32) 



or 



e, - e M sine, 

A r n , = atan' — — — (5.33) 

where [cos0J < £ < cos0 r The negation of the derivative atan' — satisfies 

% 



sin^ 

f smG x ^ sinfl sinfl 

* = ~ rsin^^c^^+^'cs^+tcos^] 2 - (534) 

V £ J 

According to (5.27), for # > 9 we have 

0<sine!< 0.7727 
0.6347 < cos0! < 1 (5.35) 
0.6406 < [cos0 J < 1. 
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Since [cosG x ] = COS 6 { - A cos8l from (5.28) we also have 
(sin^) 2 + [cosflj 2 = (sin^) 2 + (cos^ - A cos ^ ) 2 



= l- 2008^^ + a 2 ^ > l- r 2 ^ 1 (536) 

Combining (5.35) and (5.36), one can verify, forN> 9, that (5.34) satisfies 

sin 0, 

-atan' — ~ < 0.7976 . (5.37) 
5 Thus, according to (5.28) and (5.33), we have 

# m ~ 3 < ( cos 3 " [costfj x 0.7976 < 0.7976 x 2 ~ 2N/ \ (5.38) 
Combining (5.26) and (5.38), and for N > 9, we have 

0 < 9 m - 9 M < 1.566 x 2~ {N/3+l) + 0.7976 x 2~ 2Nn < 0.8827 x 2"" /3 .(5.39) 
This is the amount of coarse rotation error, due to coarse-stage simplifications, 
1 0 that a modified fine stage must compensate. Let us examine the bound on the fine- 

stage angle. 

Now, the fine rotation angle is 0,= 0 - 0 m instead of 0 X . If 0, satisfies 

e\ < r N/3 (5.40) 

then we have |sin 0, - 8J < 2 _A 76. That is, the approximations sin 0, « 0, and 
1 5 cos0 ; « 1 - 0, 2 /2 can be applied as discussed in Section 5.1. Let us now examine 

the bound on 0 ; . By definition, 

0< 6 L = ^0 L < 0.7854 x 2 ~ N} \ (5.41) 
Therefore, subtracting (5.39) from (5.41) yields 

- 0.8827 x 2 ~ N/S <9 L - (d m - 6 M ) < 0.7854 x 2 ~ Nn (5.42) 
20 which implies (5.40) because 

o t = o- e m = e M + e L - o m = e L -{e m - e M ). (5 .43) 

Hence, no lookup table is needed for the fine stage. 
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Next, we examine the magnitude of the complex input sample after 
rotation. One can verify from (5.29) that 

X\ + 3* = (XI + r o 2 )([cos0j + (sin^) 2 ). (5.44) 

Since [cos0J is obtained by truncating cos0 1? we must have 0 < [cosOj <, cos 
0 l5 thus 

[costfj + (sin^) 2 < (cos6>) 2 + (sin^) 2 = L (5.45) 

Therefore, 

X* + Y } 2 < X 2 0 +Y 0 \ (5.46) 
To maintain the magnitude, the result X l and Y l must then be multiplied by 



1 / ^[cos^ J + (sin 0 X ) . We define a new variable <5j cos(?i j such that 



^[cos^] 2 + (sin 3) 



1 =l+$_,i. (5-47) 



2 cosA 



Since ^(cos 6 X ) 2 + (sin d x ) = 1, and [cosOJ is very close to cos©! because of 



(5.28), we have that ^[cos^] 2 + (sin^)* is very close to 1. Thus, the 
& „ t value must be very small. We now examine the bound on & n i . We can 

[cos^ J J [COS6' 1 j 



write l/-J[cos^] 2 + (sin^) 2 as 



^[cos^J 2 + (sin^) 2 \ 



(cosft) 2 + (sin^) 2 
* ~ " 2"- (5-48) 



[cos^] 2 + (sin^) 2 
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Substituting (5.28) into (5.48), we have 



^[cos6(] 2 + (siti^) 2 \ 



1+ 



COS0, 



cos 6, 



^ cosij ^ cosfil 



(sin0,) 2 



(5.49) 



Because (5.28) and (5.35) imply that A oos ^ « [cos 0 X ] , we have 
A ^ << [ C0S ^i] A L^' hencewecanomit A cosfii in (5-49). Defining 



s = 



[cos<9,] 2 + (sin^) 2 



(5.50) 



then (5.49) becomes VT+2£ . From (5.28) and (5.35) we must have 8*0. 



Applying the mean-value theorem to . we have 



Vl+2£- VT+0 1 



£-0 



< 1 (5.51) 



where 0 < C sS. Hence, 



10 



Vl+2<?< 1+8 



According to (5.35), 



< 1 



[cos^f + ^in^) 2 

and therefore, from (5.28) and (5.50), we have 0 <; 8 < 



By definition, in (5.47), we have yfl+28 =1 + 8, 



(5.52) 



(5.53) 



, . Thus 8, , < 

cos£i I I cos^ J 



15 



<5, Hence S, , is bounded by 

? [cos^] ^ 



0<<S al <2- 2W/3 . 

I cos^ J 



(5.54) 



According to (5.40) and (5.54), instead of storing the sin0^ and cosO^ 
values in ROM, we may store sin0 l3 which has Nl 3 + 1 bits for each sample, and 



1904.0140003 



-75- 



[cosej, which has 2N I 3 bits. Given 0 M , the sinOj and [cose!] values are 

retrieved from the ROM to be used in performing the coarse rotation. Since the 

actual angle Q m differs from the Q M = (7T/4) 0 m , we must also store the 6^ - 6 m 

values, so that the fine stage can compensate for the difference. The 
approximations (5.6) and (5.7) still apply to Q h in view of (5.40). In addition, the 
change of magnitude in the rotation using the sin8 x and [cos0x] values, as seen in 
(5.45), must also be compensated. Therefore we store the values in order 

to scale the coarse-stage output by 1 + <5j COBfll ] • 

We can now implement the coarse rotation stage as in (5.29). Later we 
will show that the scale factor 1+ &. , can be combined with the scaling that will 



be done for the second stage (i.e., the fine stage) at its output. 

To compute 0, we must first convert the normalized 0 L value to the radian 
value 8 L , which involves a multiplication by n 1 4. Since tt; / 4 = 2" 1 + 2' 2 + 2" 5 + 
2 -8 + 2 -i3 + 5 if we mu itiply o < 0 L < T N/3 by (2" 1 + T 2 + 2" 5 + 2" 8 ), this product 

and (tz/4)0 l differ by no more than T n x 2~ m = 2 <m +l2 \ which is sufficiently 

small for a 12-bit system. (And two more bits would suffice for building a 16-bit 
system.) 

5. 3 Reduction of Multiplier Size in the Fine Stage 

In the fine rotation stage, the computations involved in generating X 2 are 




(5.55) 



Since |6J < 



it follows that 0/ can be expressed as 
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6 l - s, s ... s 0 N/3+l ...0 2N/3 0 2NI3+X ... (5.56) 
where s is the sign bit. The N 1 3 MSBs do not influence the result. This property 
helps to reduce the size of the multipliers that implement (5.55). Even more 
savings in hardware can be achieved by further reducing multiplier size, with just 
a small loss of accuracy. 

Let [Y t ] represent the 2N/3 MSBs of Y x as in 



Y^s.y, ...y %Nn JWi — = ft]+ A v 



(5.57) 



Then we must have 



< 2 . The error contributed to the product Yfij by 



10 



using [7J instead of Y x is 

^-[^]^|=|A 7i ^|<2^. (5.58) 
Therefore, for TV-bit precision, the multiplication 1^0, can be accomplished with 
a {IN 1 3) x (2NI3) multiplier. 

This method can be applied to the computation of 0, 2 /2. Defining [6J as 
the 2NI 3 MSBs of 0 /? and letting A 0 denote the remaining LSBs, we have 



15 



20 



\@1 ] ~ S S * • • S &N/3+1 • * * 



A — f) 



and 

The error in calculating 0, 2 /2 using [0 J instead of 0, is 



|N +a ,) 2 -[^] 2/2 *N a J <2 "- 



(5.59) 
(5.60) 

(5.61) 



Thus 6 2 can be implemented with an (A/7 3) x (N /3) multiplier, since the N I 3 
MSBs of [6J are just sign-bits. 
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5. 4 Scaling Multiplier Simplification 



10 



15 



20 



As mentioned in Section 5.2, the scale factor 1+ 8, , can be applied at 

' [cos6^J 

the output of the fine stage. A straightforward implementation would use the full 
wordlength of 1 + 8, , in the product X= X 2 ( 1 + 5< A which would require 

a multiplier of size N* N. But this multiplier's size can be reduced as follows: 
According to (5.54), 0 < 8 } < Moving the factor 1+8 , into the 

fine stage, we have 

X, = X,(l- 81 /2)(l + Y A (l + (5.62) 

= ^ + ^,(^l-«f/2)-W (5.63) 

The only significant error in approximating (5.62) by (5.63) is the absence of the 
OA , term in the factor multiplying Y x . But this is tolerable since, according 

* [cos6^j 

to (5.54) and (5.40), 



OA 



COS^ 



< 2" 



(5.64) 



In view of (5.40) we have 0 < 0 ; 2 < 2 -2W/3 which, combined with (5.54), yields 



cosflj * 



< 2 



-2N/3 



(5.65) 



Thus, if we truncate S cos6 - 8//2 to Ambits, only the least significant N/3 bits in 



the truncated result will be non-sign bits. Therefore, in our computation of 
X x { - 9//2) in (5.63), if we truncate X x to NI 3 bits, we can use an (N I 3) 



x (N/3) multiplier, with the product's error bound being 



IcostfJ ' 



2 -N,3 < 2 -N 



(5.66) 
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By merging the scale factor of the coarse stage into the fine stage, we thus 
replace multiplications by the scale factor by additions. The final architecture is 
shown in FIG. 39, where the size of the multipliers is shown in FIG. 40. 

FIG. 39 illustrates an angle rotator 3900 according to embodiments of the 
5 invention. The angle rotator 3 900 includes a ROM 3 902, a fine adjustment circuit 

3904, a first butterfly circuit 3908, and a second butterfly circuit 3910. The angle 
rotator 3900 rotates an input complex signal 3906 according to angle 0 to 
produce a rotated complex output signal 3912. The angle 0 can be broken down 
into a most significant portion (or word) Q M and least significant portion (word) 
1 0 Q L Note that normalized angle values are shown in FIG. 3 9, as represented by the 

0 nomenclature. However normalized angle values are not required, as will be 
understood by those skilled in the arts. 

The ROM 3902 stores the following for each corresponding 0 : sin Q l 
[cos 0J <5j cos6 ^ , and 0 M - 0 OT , where all of these values have been exactly 

1 5 defined in preceding sections. To summarize, the sin 0j and [cos 0j] values are 

MSBs of sin 0 M and cos Q l9 respectively. The<^ oMtf j error value represents the 

difference between the cos 0 M and the [cos 0J value.( The exact definition for 
^[cos^] is S* ven in ( 5 - 47 )) Likewise, the (0 M - 0J error value represents the 

difference between sin 0 M and the sin 0 2 value. (The exact definition for Q m is 

20 given in equation (5 . 3 0)) 

The butterfly circuit 3908 includes multiple multipliers and adders as 
shown. The implementation of these multipliers and adders is well known to those 
skilled in the arts. In embodiments, the sizes of the multipliers and adders are as 
shown in FIG. 40. Note that savings are obtained on the size of the multipliers 

25 because of bit truncated approximations that are described above. This produces 

a faster and more efficient angle rotator compared to other angle rotator schemes. 

The operation of the angle rotator 3900 is further described in reference 
to the flowchart 4100. As with all flowcharts herein, the order of the steps is not 

1904.0140003 



-79- 



limiting, as one or more steps can be performed simultaneously (or in a different 
order) as will be understood by those skilled in the arts. 

In step 4102, the input complex signal is received. 

In step 4104, the sin Q l9 cos [0 X ], S, and 0 M - d m values are 

I cosG^ I 

5 retrieved from the ROM 3902, based on the rotation angle 6. 

In step 4106, the butterfly circuit 3908 multiplies the input complex signal 
by the sin 0 : and [cos 0J values to perform a coarse rotation of the input 
complex signal, resulting in an intermediate complex signal at the output of the 
butterfly circuit 3908. 

10 In step 4108, the adder 3914 adds the 0 L value to the error value 0 M - 0 W 

to produce a 0/ angle. 

In step 41 10, a fine adjustment circuit 3904 generates a fine adjust value 
( 8r , - 6/) based on the 0 ; angle and 5, , . 

I cos Q x I I cos ^ I 

In step 4112, the butterfly circuit 3910 multiplies the intermediate complex 
1 5 signal by the 0/ angle, and the fine adjustment value ( <5j cosfil j - 0/ 2 ) to perform a 

fine rotation of the intermediate complex number, resulting in the output complex 
signal 3912. 

In embodiments, the ROM 3902 storage space is 2 m words, where N is 
the bit size of the real or imaginary input complex number 3906. Therefore, the 

20 overall size of the ROM 3902 can be quite small compared with other techniques. 

This occurs because of the two-stage coarse/fine rotation configuration of the 
angle rotator 3900, and saving of storing sin d l9 [cos 0 X ] instead of sin 0 and cos 
0. Also, there is another advantage to having a small ROM: in certain 
technologies it is awkward to implement a ROM. Thus, if only a small ROM is 

25 needed, it is possible to implement the ROM's input/output relationship by 

combinatorial logic circuits instead of employing a ROM. Such circuits will not 
consume an unreasonable amount of chip area if they need only be equivalent to 
a small ROM. 
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J. 5 Computational Accuracy and Wordlength 

In this section we study the effect of quantization errors on the final 
output's computational accuracy and the most efficient way to quantize the data 
for a given accuracy. 

In our algorithm, the errors can be classified into three categories. The first 
category is the quantization of the values in the ROM table. The second category 
is the error due to the truncation of data before multiplications, to reduce 
multiplier size. The third type of error is that resulting from approximating sin 0; 
by 6 ; . Quantization errors are marked in FIG. 40 with an £ marker as shown. The 
total error can be obtained by combining the errors propagated from each source. 
To calculate the propagated error at the output with a given error at the source, 
we can first identify all paths by which the error reaches the output and then use 
the approach discussed in Section 5.1.3. Let us first examine all the error sources 
and determine their effects on X, which is the real component of the output 
complex signal 3912. Table 5. 1 displays this information. (Similar results apply 
toK) 

The values stored in the ROM are sin 0 l5 [cos 8J, 6 M - 0 W and ^ cos ^ ] , 
where sin 0 X and [cos 0J are MSBs of sin Q M and cos 0 X , respectively. A loss of 
precision due to ROM quantization error depends only on the number of bits used 
in representing 0 M - d m and S [cosd}] . 
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Table 5.1 Effect of errors at the X output 





Error source 


Error at the 
output 




quantizing (0 M -OJinROM 


YA 




quantizing ( S [cos0 ^ j ) in ROM 


YA 


*3 


truncating Y x for Y l 6 l 




*4 


truncating X Y for 




( &\ 
X 1 

^oo.^] 2 J 




^oo^i 2 J 




truncating 0 ; for 0 ; 2 


-QtX.S, 




truncating 0, 2 for 




I^M] 2 J 




YA 




(At, 

quantizing 6^ = ^— J 9 L 


YA 




quantizing X at the output 






approximating sin Q, by 0 ; 






neglecting ^fi in (5.62) 


F i (- W/) 



The total error inXcanbe obtained by combining all the terms in the third column 
of Table 5.1: 



rfa + eJ+YA^-f- + x{z 2 + * 6 )- x^A 



(5.67) 



8 



1904.0140003 



-82- 



Since C s in Table 5 . 1 is a truncation error, we have £> 6 > 0, If we quantize 
\zos9 x ] ^ rounding it upward before storing it in ROM, then £ 2 <:0. This way 
such errors tend to cancel each other. Cancelling errors are grouped together in 
(5.67) since the magnitude of their combined error is no greater than the larger of 
5 the two. This yields seven terms in (5 .67), each contributing a maximum possible 

error of 2~ N . If the multiplier sizes are as indicated in FIG. 40, the total error in 
Xis bounded by 7 x 2" N . 

From the above analysis it can be seen that the computation errors 
resulting from hardware reduction have similar magnitudes and no particular 
1 0 source dominates. This seems to provide the best trade-off between the hardware 

complexity and the accuracy of the entire system. 

According to (5.67), the total output error can be reduced by increasing 
the internal data wordlength and the wordlength of each sample in the ROM. For 
each bit increase, we get one more bit of precision at the output. Therefore, we 
15 can design the processor to have the minimum hardware for a given precision 

requirement. Next, we give a simulation example to illustrate this method. 

Example: A cosine waveform with an error less than 2' 12 is specified. 
According to (5.67), we chose N= 15, as indicated in FIG. 40. We obtained the 
maximum error to be approximately 5 x 10" J , which is considerably smaller than 
20 2~ 12 

N_ 

In FIG. 40, the ROM is shown as having 2 3 words to achieve no more 
than a total error of 7 x 2~ N in the X output. If Nis not a multiple of 3, we can 
choose the smallest N 9 > N that is a multiple of 3 . Having 2 3 words in ROM, 
of course, suffices to achieve the required precision. As discussed before, the total 
25 output error is a combination of errors from various sources, such as from 

quantizing the data before multiplications and from approximating sin 6, by d h etc. 
However, our error bound estimation is rather conservative. Hence, the ROM 
size can be perturbed to determine the minimum size to satisfy a specific precision 
requirement. Our experience in designing the angle-rotation processor has shown 
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that, even by rounding down to determine the ROM size, the total error is still 
less than 7 * 2~ N . 

5. 6 Comparison with the Single-Stage Mixer 

As mentioned earlier, the main advantage of the two-stage angle rotator 
is that it requires only a small ROM 3902. For the single-stage angle rotation, the 
ROM size grows exponentially with the precision of the angle. Thus, our two- 
stage method is well-suited for applications where more than 14 bits in the input 
angle are required. In this case, the sine lookup table for the single-stage angle- 
rotator, even with compression, is too large for high-speed operations (Vankka, 
J., "Methods of mapping from phase to sine amplitude in direct digital synthesis," 
IEEE Trans, Ultrasonics, Ferroelectronics and Freq. Control 44\ 526-534 
(1997)). However, the following comparison of our method to a well-known 
single-stage method with 14-bit input angle shows that even in this case our 
method has advantages, and this is true even when the single-stage method is 
optimized for that particular precision requirement. 

To compare, we use the quadrature direct digital frequency synthesizer/ 
mixer (QDDFSM) with 14-bit input angle and 12-bit input data that is reported 
in (Tan, L. and Samueli, H., "A200-MHz quadrature frequency synthesizer/mixer 
in 0.8-nm CMOS," IEEE J. Solid-State Circuits 30: 193-200 (1995)). It achieves 
84.3 dB spurious free dynamic range (SFDR). According to this method, the sine 
and cosine values are generated using a DDFS, which employes lookup tables for 
these values. To reduce the ROM size, ROM compression techniques are used. 
The DDFS is followed by four 12x 12 real multiplications. 

For our structure, we chose the internal wordlengths and multiplier sizes 
as indicated in FIG. 42. The phase-accumulator that generates 0, as well as the 
circuit that maps an angle in the range [0, 2n] into [0, tt/4], are described in 
(Madisetti, A., "VLSI architectures and IC implementations for bandwidth 
efficient communications," Ph.D. dissertation, University of California, Los 
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Angeles (1996)). These structures are also employed here in our test. Truncating 
the 32-bit phase word to 14 bits, this structure has achieved a SFDR of 90.36 dB, 
as shown in FIG. 43. This is 6 dB better than the single-stage method. 

The integrated circuit that implements this structure is currently being 
5 built. A preliminary estimation of its hardware complexity yields a similar 

transistor count as that of (Tan, L. and Samueli, H., IEEE J, Solid-State Circuits 
30: 193-200 (1995)). Thus, using approximately the same number of transistors, 
our structure achieves a 6 dB performance gain. Our structure requires a much 
smaller ROM (17 x 25 = 425 bits) in comparison to the single-stage method, 

1 0 which needs a 3 072-bit ROM when the ROM compression technique is employed. 

Since the ROM access is hard to pipeline, it is usually the bottleneck in the data 
path, thereby limiting the achievable data rate. Hence, one pronounced benefit of 
having a much smaller ROM would be the much faster ROM access. Also, there 
is another advantage to having a small ROM: in certain technologies it is awkward 

15 to implement a ROM. Thus, if only a small ROM is needed, it is possible to 

implement the ROM's input/output relationship by combinatorial logic circuits 
instead of employing a ROM. Such circuits will not consume an unreasonable 
amount of chip area if they need only be equivalent to a small ROM. 

5.7 A Modified Structure When Only One Output is Needed 

20 In some applications, such as the implementation of the trigonometric 

interpolator discussed in the previous sections, only one output, say X is needed. 
In such cases, obviously, we can eliminate certain computations used to generate 
Y. However, using the angle rotator 3900, only those generating Y in the fine 
stage are subject to deletion, while the coarse stage must remain the same, since 

25 we need both X x and Y x to generate the X output. Let us seek to further simplify 

the coarse stage by attempting to eliminate one multiplication by cos 0^. 
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5. 7. 1 Modifications to the Coarse Stage 



10 



15 



20 



If we factor out the cos Q M term of the coarse stage in (5 .16), we can 
then apply the factor cos 6^ to the output of the second stage in (5 . 1 7), because 
the two operations (scaling and rotation) are permutable, to obtain 



X x = X o -Y o tzn0 M 



Y l = Y.-X.tmO, 



(5.68) 



M 



X= cos# 



M 



Y= cos6> 



M 



v 

( ( 



l- 



2 



(5.69) 



In this case, we have only two multipliers in the coarse stage (5.68), and the 
multiplications by the scale factor 8^ are applied to the output of the fine stage 
(5 . 69). Unlike the situation in (5 . 1 6) and (5 . 1 7), if only one output from the angle 
rotator, say JSC is needed, we can also eliminate one more multiplier - the one that 
multiplies the coarse stage output with the cos Q M factor. As in Section 5.2, we 
now seek to simplify the coarse stage in (5.68). 

Let tan Q m be tan 0^ rounded upward at the (M3)-rd bit. In other 
words, writing 0^ as the binary number 



tan d M =0.b 1 ... b N/3 b N/3+1 
where b n € {0, 1}, tan Q m is obtained from tan 0^ according to 

tan8„ = 0A...W2-" /3 . 

Obviously, 



0< tan9 m - tanG 



M 



< 2 



-N/3 



(5.70) 
(5.71) 
(5.72) 



The M3-bit number tan0 w decreases the number of partial products needed in 
computing X o tan0 w and r o tan0 w to at most a third of those needed for X o tan0^ 
and F o tan0 M . 
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The resulting fine-stage angle is 9, = 0-0 w . Thus, as in Section 5.2, a 
modified fine-stage angle compensates for a simplified coarse-stage angle. If 0 ; 
satisfies (5.40), we then have |sin0 r 6 / |<2' i 76. That is, the approximations 
sine, = 6, and cos9 , = 1- 9f /2 can be applied. The proof that (5.40) holds is 
as follows: 

Proof. According to the mean value theorem 

tanG - tan 6 



= tan' I (5.73) 



where E,=d M + (0 m - Q^h, 0 < h <. 1. The derivative tan'S satisfies 

tan' £ = 1 + (tan£ ) 2 > 1, for every £ . (5.74) 

10 Re-arranging (5.73), and using (5.74), we have 

tan0„ - tanG M 

Q m ~K = ^ ^ < tanG, - tanG (5.75) 

Hence, according to (5.72), 

o<e m -e^< (5.76) 

By definition, 

15 O<0 L <2 M3 . (5.77) 
Therefore, subtracting (5.76) from (5.77) yields 

-2- m <h-(t m -K)<T m - (5.78) 

which is exactly (5.40) because 

9/ = 9-e m = 0^+0 i -0 m = 0 £ -(0 OT -0 A/ ). (5.79) 

20 This concludes our proof. 

This indicates that, instead of storing the tanO^ values in the ROM , we may store 
tan9 M , which has N/3 bits for each sample, and we may store Q m - d M . This results 
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in a reduction of the multiplier size in the coarse stage. The difference between 
Q m and 0^ can be compensated in the following fine rotation stage. Furthermore, 
the approximations (5.6) and (5.7) still apply to 0 /5 in view of (5.40). 
We can now implement the coarse rotation stage as follows: 

^ATo-yotane,, 

Accordingly, the scale-factor at the output of the fine stage is cos0 m instead of 
cos©^. Since 0, satisfies (5.40), the fine stage simplification is similar to the 
method described in Section 5.3. Next we examine how the multiplications of the 
fine-stage output by cos0 OT can be simplified. 

10 5.7.2 Scaling Multiplier Simplification 

A straightforward implementation would use the full wordlength of cos0 OT 
in the product X = X 2 cosQ m , which would require a multiplier of size NxN. But 
this multiplier's size can be reduced as follows: By defining [cos0 m ] as the 2M3+1 
MSBs of cos0 w the scale factor can be written as 

' A. ^ 



15 cos9 OT = [cos0 J+ A cos6m = [cosGJ 

A 



j cos0„ 



[cosej; 



(5.81) 



Let us define S cm6m = cos *" and, since 0 < Q m </4, we surely have [cos 0 m ] > 
[cos U m \ 

0.5, and hence 

O<S cos0m <2x2- 2 ™-\ (5.82) 

Moving the factor 1 + S cos&m into the fine stage, we have 

20 *2 = Of /2)(1+ S^J- (5-83) 

= X l+ - Of 12)- (5.84) 
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The only agnificantOTorinapproximating(5.83)by(5.84)is the absence of the 0,4.^ 

term in the factor multiplying Y v But this is tolerable since, according to (5.40) 
and (5.82), 

In view of (5.40) we have 0 < 6] < 2~ 2N ' 3 which, combined with (5.82), 
yields 

l^-^ /2 l <2 "" /3 - (5 - 86) 

Thus, if we truncate S GOS0 - Of f 2 to iVbits, only the least significant N/3 bits 
in the truncated result will be non-sign bits. Therefore, in our computation of 
X \(P**$ ~ d i /2 ) in (5.84), ifwe truncate^ to M3 bits, we can use an (M3)x 

(N/3) multiplier, with the product's error bound being 

l^-fl?/2|2- W3 <2- y . (5.87) 

The factorization of cos0 m in (5.81) allows a reduction of the multiplier to 
approximately 2/3 its original size. In this case, the values of [cos0 J and & GO80m 

are stored in the ROM instead of cos0 m . 

The modified structure for one output is illustrated as angle rotator 4400 
in Figure 44.The angle rotator 4400 includes a ROM 4402, a fine adjustment 
circuit 4404, a first butterfly circuit 4408, and a second butterfly circuit 44 1 0 . The 
angle rotator 4400 rotates an input complex signal 4406 according to angle 0 to 
produce a rotated complex output signal 44 1 2 . As with the rotator 3 900, the angle 
0 can be broken down into a most significant portion (or word) 0^ and least 
significant portion (word) Q L . Note that normalized angle values are shown in 
FIGs. 39, 40, 42, and 44, as represented by the 9 nomenclature. However 
normalized angle values are not required, as will be understood by those skilled 
in the arts. 
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The ROM 4402 stores the following for each corresponding normalized 
8: tan 0^ [cos 0J ^cosB m > ®m " ®«» where all of these values have been 

exactly defined in preceding sections. 

In the butterfly circuit 4410, the arithmetic units that are encircled by the 
line 44 1 8 can be eliminated when only the X output is needed in the output signal 
44 1 2 . This may be desirable for applications where only one output from the angle 
rotator 4400 is needed, such as when implementing a trigonometric interpolator, 
such as interpolator 1000 in FIG. 10 or interpolator 1400 in FIG. 14. 

The operation of the angle rotator 4400 is further described in reference 
to the flowchart 4500 in FIG. 45. As with all flowcharts herein, the order of the 
steps is not limiting, as one or more steps can be performed simultaneously (or in 
a different order) as will be understood by those skilled in the arts. 

In step 4502, the input complex signal 4406 is received. 

In step 4504, the tan 0 m [cos 0 m ], ^ cosj 9 m > afld 8^ - Q m values are 

retrieved from the ROM 4402, based on the rotation angle 6 (or the normalized 
value 8 ). 

In step 4506, the butterfly circuit 4408 multiplies the input complex signal 
4406 by tan 0 m to perform a coarse rotation of the input complex number, 
resulting in an intermediate complex signal at the output of the butterfly circuit 
4408. 

In step 4508, the adder 4414 adds the 0 L value to the error value 0 M - Q m 
to produce a 0, angle. 

In step 4510, a fine adjustment circuit 4404 generates a fine adjust value 

( ^ COS 0 m " °r) based on the 6 / an S le ^ £cort m 

In step 4512, the butterfly circuit 44 1 0 multiplies the intermediate complex 
signal by the 0/ angle, and the fine adjustment value ( S cosg - 0/) to perform a 

fine rotation of the intermediate complex signal, resulting in the output complex 
signal. 
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In step 4514, the X value for the output complex signal is scaled by the 
[cos 0 m ] value, resulting in the output complex number 4412. As discussed above, 
the elements inside the outline 4418 can be eliminated if only the X value of signal 
4412 is desired. Alternatively, similar elements could be eliminated from the 
5 butterfly circuit 4410 if only the Y value of signal 4412 was desired. 

J. 8 Application of Angle Rotation Processors 

This subsection describes exemplary applications for angle rotator 
processors. These applications are provided for example purposes only and are not 

10 meant to be limiting, as those skilled in the arts will recognize other applications 

based on the discussions given herein. These other applications are within the 
scope and spirit of the present invention. 

One application for the angle rotation processor is the Quadrature Direct 
Digital Frequency Synthesizer/Mixer (QDDFSM), including a few special cases 

1 5 that are candidates for the angle rotator algorithm. One is the case when only one 

of the outputs (Xor Y) is desired, as shown by angle rotator 4400 (FIG. 44). As 
shown in FIG. 44, this is accomplished by simply deleting the hardware required 
for the computation of the unused output. Yet another special case of QDDFSM 
is the Direct Digital Frequency Synthesizer (DDFS). In DDFS configuration we 

20 simply fix the input vector (Z 0 , Y 0 ) to be (1,0). This enables the complete 

elimination of the coarse stage by taking advantage of the fact that 1 x A = A and 
0 x A = 0. In the following section we will concentrate our discussion on the 
QDDFSM, since it is the general case, while keeping in mind the special cases and 
the associated hardware reductions mentioned above. 

25 

5. & 1 Using the Angle Rotation Processor in a Quadrature 
Direct Digital Frequency Synthesizer/Mixer 



The frequency synthesis and mixing operation can be described with the 
following pair of equations, which relate an input with x-y coordinates (X 0 , Y 0 ) and 
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a frequency control word (few) for the synthesizer, to an output with new x-y 
coordinates (X, Y), The following pair of equations establishes the relationship 
between (X 0 , Y 0 ), few, and (X 9 Y). 

X= [X Q x cos(fcw x n)] - [Y 0 x sin(fcw x «)] 

(5.88) 

7= [F 0 x cos(fcw x n)] + [X 0 xsin(fcwx«)] 
where n is the time index 

Per (5.88), since the sine and cosine functions are periodic with period 2n 
(i.e., few x n = <fcw xn> 2n = where <> is a modulo operator) an overflowing 
adder is used as a phase accumulator to compute 4> from the input few, as shown 
by the the phase accumulator 4600 in FIG. 46. 

Now, for any given time instance n, we have a corresponding angle $ from 
the phase accumulator, hence the original pair of equations (5.88) for QDDFSM 
can be rewritten in terms of the angle <|) as follows. 

X= [X 0 xcos<t>] - [F 0 xsin(|>] 

(5.89) 

7= |Y 0 xcos<|)] + [X 0 xsinct)] 

Note that the expressions (5.89) are exactly those of an angle rotator 
expressed by equations (5. 1). By applying a phase accumulator fed by an few, we 
have converted the QDDFSM into an angle rotation application. The only conflict 
between the above expressions and the angle rotation processor is that the angle 
rotation processor takes an angle 6 in the range [0, 7c/4], while the angle $ in the 
above expressions is in the interval [0,2n). 



1904.0140003 



-92- 



5.8.1.1 A General Angle Rotator for Arbitrary Input 
Angles 

Let us consider the changes necessary to make the angle rotation processor 
use an input angle $ that may lie outside the [0, tc/4) range. Fortunately, a simple 
interchange operation at the input of the coarse stage, and an interchange/negate 
operation at the output of the fine stage is all we need in order to map (J> into an 
angle 9 in the range [0, tc/4] and use it as the input to the angle rotator. Even 
though the input angle 9 is in the range [0, tc/4], the rotation by 9 along with the 
interchange and interchange/negate operations make the overall rotation of the 
input (X 0 , Y 0 ) equivalent to a rotation by the original angle $ in the full range 
[0,2tc). The latter is possible because of the convenient symmetry properties of 
sine and cosine functions over the range [0, 2n], 

For example, sin <(> = -sin((|) - tc) and cos $ = -cos((j> - 7i), while sin 4> = 
cos(<t> - n/2) and cos <J> = -sin((|> - %/2), and finally, for 0 < § < tc/4, if we write 
n/4 + <|> for <J> then sin(7t/4 + 4>) = cos(tc/4 - and cos(ti/4 + (|>) = sin(7i/4 - 
Using the first pair of trigonometric identities, we can map <J> into the range [0, n]) 
by simply performing a negate operation at the output of the angle rotator. Using 
the second pair of identities along with the first pair enables one to map 4> into 
the range [0, n/2) by performing negate and interchange operations at the output 
of the angle rotator. Finally, using all three pairs of identities, the angle $ can be 
mapped into the range [0, n/4) by performing an interchange operation at the 
input of the angle rotator, along with interchange and negate operations at the 
output of the angle rotator. Note that all of these interchange and negate 
operations are conditioned only on the issue of which octant § is in. This means 
that if <() is a normalized angle, then the interchange and negate decisions depend 
only on the top three MSB bits of The following tables show the interchange 
and negate operations required for all eight octants (specified by the three MSB 
bits of <|>). It is evident, as well, that other interchange and negate criteria for the 
input and output would also be suitable. 
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This table indicates when an interchange operation is required at the input 
and when an interchange operation is required at the output of the angle rotator. 



Octant of <t> (3 MSBs of <f>) 


Output Interchange 


Input Interchange 


1-st octant (000) 






2-nd octant (001) 




Interchange inputs 


3-rd octant (010) 


Interchange outputs 




4-th octant (011) 


Interchange outputs 


Interchange inputs 


5-th octant (100) 






6-th octant (101) 




Interchange inputs 


7-th octant (110) 


Interchange outputs 




8-th octant (111) 


Interchange outputs 


Interchange inputs 



The following table indicates when a negation operation is required at the 
output of the angle rotator. 



Octant of 4> (3 MSBs of <t>) 


Negation of output X 


Negation of output Y 


1-st octant (000) 






2-nd octant (001) 


Negate output X 




3-rd octant (010) 




Negate output Y 


4-th octant (011) 


Negate output X 


Negate output Y 


5-th octant (100) 


Negate output X 


Negate output Y 


6-th octant (101) 




Negate output Y 


7-th octant (110) 


Negate output X 




8-th octant (111) 







Note that the flag for input interchange is simply the 3rd MSB bit of <|), while the 
flag for output interchange is just the 2nd MSB bit of <|). Finally, to produce the 

25 remapped angle 0 in the range [0, %/4) for the angle rotation processor, we 

simply take the remaining bits of <[) after stripping the top two MSBs and 
performing a conditional subtract operation to produce 9. More specifically, if the 
MSB bit (after stripping the two MSB bits) is low, i.e., the angle is in an even 
octant (numbering them 0,..., 7), we pass the angle unchanged, otherwise we 

30 perform a "two's-complement type" inversion of the angle. Note here that after 

such remapping operation, the MSB bit of 0 is set to one only in the case when 0 
= 7t/4. This fact is useful in determining the required amount of lookup table in the 
angle rotation processor. In other words, even though the MSB bit of 0 is an 
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address to the lookup table, since we know that when it is 4 1' the remaining bits 
have to all be 4 0' we only need to allocate a single address for that case (as 
opposed to increasing the size of the lookup table by an entire factor of two). 

5.8.1.2 Adapting the General Angle Rotator to Make a 
QDDFSM 

The structure of the QDDFSM using an angle rotation processor 3900 is 
depicted in the figure 4700. It simply requires the employment of a phase 
accumulator 4702 and a conditional subtract 4704 to provide an input angle from 
the input frequency control word few. We refer to the system of Fig. 47 with the 
phase accumulator excluded as a General Angle Rotator. It has the capability to 
receive an angle in the interval [0, 2n) and to perform an angle rotation of the 
input data (X 0 , 7 0 ) by that angle. We show a general angle rotator in Fig. 48, but 
one in which further structural simplification has been made, The method of 
performing these simplifications will be discussed next. 

5.8.2 How to Use the Conditionally Negating Multipliers in 
the General Angle Rotator 

For a moment assume we have a powerful technique for making 
conditionally negating multipliers. What we mean by that is a multiplier which 
takes a negate flag to produce an output depending on that negate flag as follows: 
The output is simply the product of the input signals if the flag is low (0) and the 
output is the negative of the product of the input signals if the flag is high (1). 

Each one of the two outputs in the coarse and fine stages is computed with 
two multipliers and one adder as shown in Fig. 47. These multipliers and the 
adder are implemented in a single Carry-Save Adder (CSA) tree, with the partial 
products being generated from Booth decode modules corresponding to the two 
multipliers. This technique of employing a single tree eliminates the need for 
intermediate carry propagation from each multiplier and makes the propagation 
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delay of each rotation stage very short. Note that the single CSA tree 
implementation is possible since the multipliers are operating in parallel. 
Furthermore, because the structure that is needed to compute one output of a 
rotation stage is identical to the structure required by the other output (with the 
exception of the minus sign), a single CSA tree can easily be interleaved between 
the two outputs for a significant amount of hardware savings. The minus sign at 
the output of the multiplier can be implemented very efficiently by the technique 
described in the following sections (using the conditionally negating multiplier). 
The negation or non-negation of the multiplier output can be controlled with a flag 
that changes between the two cycles of the interleave operation. 

The angle at the output of the conditional subtract module 4704 in Fig. 47 
is in the range [0, te/4]. As already discussed, the outputs for the angles outside 
this range are constructed by mapping the angle into the range [0, %/4] while 
conditionally interchanging the inputs (inputs to the coarse stage) and 
conditionally interchanging and negating the outputs (outputs of the fine stage) of 
the angle rotator. A negation at the output of the fine stage simply means changing 
the output signs of the multipliers and negating the input of the adder coming from 
the input of the fine rotation stage. Changing the output signs of the multipliers 
is once again accomplished by using conditionally negating multipliers. The 
negation of the input to the fine rotation stage can easily be implemented with 
XOR gates and a conditional high or low bit insertion into the CSA tree at the 
position corresponding to the LSB location of the input. Since this conditional 
high or low bit is inserted in the CSA tree, there is no additional carry propagation 
introduced for the negation of the input. Note that the latter technique eliminates 
any circuitry required to implement the conditional negation of the outputs, and 
hence eliminates any carry propagations associated with two's complement 
numbers. 

Furthermore, the conditional interchange of the outputs can be 
implemented by conditionally interchanging the inputs of the fine rotation stage 
and appropriately controlling the signs of the multiplier outputs in the fine stage. 



1904.0140003 



-96- 



The conditional interchange of the fine stage inputs can be propagated to the 
inputs of the coarse stage with the same line of reasoning. Remember that the 
inputs to the coarse stage were conditionally interchanged according to the three 
MSBs of the input angle anyway. In conclusion, the conditional interchange and 
negation operations of the outputs can be implemented by modifying only the 
condition of the interchange at the inputs of the coarse stage and appropriately 
controlling the multiplier output signs by using conditionally negating multipliers 
(which we had to do for interleaving anyway). This eliminates the conditional 
negate and interchange block at the output of the fine stage entirely (i.e., it 
eliminates muxes and two's complement negators), and also eliminates the need 
for storing and pipelining control signals (i.e., it eliminates registers) to perform 
the conditional interchange and negation operations at the output. The resulting 
General Angle Rotator 4800 is now depicted in the following FIG. 48. 

5. 8.2. 1 Booth Multiplier 

There are many algorithms for digital multiplication. One of the most 
popular is the Booth multiplier. The essence of the Booth multiplier is in the 
decoding scheme performed on the multiplicand to reduce the number of partial 
products which, when added together, produce the desired product. For anNxM 
Booth multiplier, where N is the wordlength of the multiplier, and M is the 
wordlength of the multiplicand, there will be ceiling^ 12) Booth decoders. Each 
Booth decoder will take three bits from the multiplier (with one bit overlapping 
the decoders on both sides) and will manipulate the multiplicand according to the 
Booth decoding table 5000 shown in Fig. 50. Some relevant details for a 10 xM 
Booth multiplier are depicted in Fig. 49, especially how the multiplier bits feed 
into the Booth decoders to produce the five partial products which, when added, 
compute the result (the product of the multiplier and the multiplicand). 
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5. 8.2.2 How to Make a Negating Booth Multiplier 

Suppose we wish to make a multiplier that produces the negative of the 
product. More specifically, suppose we wish to multiply two signals N and M and 
get -C = -(NxM). The latter can be accomplished in a number of different ways. 
The most obvious is perhaps to use a regular multiplier to produce the product C 
= (NxAJ) and then negate C to achieve -C = -(N x M). In case of two's 
complement representation, this approach requires an additional carry propagation 
chain through the negator, which is costly in terms of speed and additional 
hardware associated with a negating circuit. Another approach, described below, 
is more favorable in a few key aspects. 

The product C is essentially the result of adding a number of partial 
products, which are generated by the Booth decode blocks as described in the 
previous section. Therefore, we can write the following sum expression for C: 

7 = 1 

where/?, are the n (in the 10 xM example above n = 5) partial products generated 
from the n Booth decoders. Note that, in order to negate C, we can negate all of 
the partial products and proceed with the summation of the negated partial 
products to produce -C. The expression for -C is then the following: 

-C=t- Pi , (5.9D 

1=1 

where -p, are the negated n partial products generated from the n Booth decoders. 
Let us investigate how the Booth decoders need to change to produce the desired 
negated partial products. All we need to do is to change the decoding table 5000 
from that of Fig. 50, to the decoding table 5100 in Fig. 51. Note that the 
difference between the tables is only in the partial product columns and, more 
specifically, the partial product column 5102 of table 5100 is the negative of the 
partial product column 5002 of table 5000. This means that by simply modifying 
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the Booth decode table to the negating Booth decode table shown in Fig. 51, the 
result will be the negative of the product, as desired, with absolutely no additional 
hardware and absolutely no speed penalty. An example for a 10 xM negating 
Booth multiplier 5200 is shown in Fig. 52. 

5.8.2.3 How to Make a Conditionally Negative Booth 
Multiplier 

A particularly interesting case arises when one wishes the multiplier 
product to be negated sometimes, and normal (non-negated) the other times. One 
can extend the idea presented in the previous section to craft the following 
powerful technique. Let us investigate the original Booth decode table 5000 
depicted in Fig. 50 and the negating Booth decode table 5100 of Fig. 51 a bit 
more closely. Note the horizontal line of symmetry that runs through the midline 
of both decoding tables. This line of symmetry suggests that we can create the 
negating Booth decode table 5 100 from the original Booth decode table 5000 by 
simply inverting the three bits (b2 b 1 bO). For example, if the three bits (b2 b 1 bO) 
are (0 1 0), then, according to the original Booth decode table, the corresponding 
partial product is A, where A is the multiplicand. If we invert the three bits (b2 
bl bO) as suggested above, we will have (1 0 1) and the corresponding partial 
product will be -A, exactly what is needed for a negated partial product. 

Given a signal F which specifies when the output of the multiplier should 
be negated and when not (F = 0 implies regular multiplication, F = 1 implies 
negating multiplication), F can simply be XORd with the three bits (b2 bl bO) at 
the input of the regular Booth decoders to make a new conditionally negating 
Booth decoder, hence a conditionally negating multiplier. The details of a 
conditionally negating Booth decoder 5300 are captured in Fig. 53 . Note that with 
a minimal amount of hardware (N XOR gates for an NxM multiplier, which is 
insignificant compared to the hardware cost of the entire multiplier), we have the 
means to control the sign of the multiplier product. Also note that the overall 
latency of the multiplier is increased insignificantly since the latency through a 
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single XOR gate is much smaller than the latency through the entire multiplier. 
Furthermore, the latency of a single XOR gate is much smaller than the latency 
associated with a cany propagation chain that would be necessary if one built such 
a circuit with a two's complement negator. A 10 x M conditionally negating 
multiplier 5400 is shown in Fig. 54. 

5, & 3 Using the Angle Rotation Processor in a Quadrature 
Direct Digital Frequency Synthesizer 

As mentioned above, the angle rotator is useful in implementing various 
forms of direct digital synthesizers. In this case, all starting points for the angle 
rotations are X 0 = 1, Y Q = 0 (with, of course, the various usual 
interchange/negation requirements). Fig. 55 shows a quadrature direct digital 
synthesizer (QDDS) 5500, a system having two outputs, one being samples of a 
cosine waveform and the other being samples of a sine waveform. An exact 90- 
degree phase offset between the two waveforms is obtained by the QDDS, and 
numerous applications for such a device are well known. No X 0 and Y 0 input 
samples are shown in the Fig. 55 system. These fixed values have been "built in" 
and used to greatly simplify the coarse rotation stage. 

Notice that the angle rotator 5502 is preceded by a system 5504 that 
generates a data stream of input rotation angles, a so-called overflowing phase 
accumulator 5506, and its input is a single fixed data word that precisely controls 
the frequency of the output sample waveforms. The three MSBs of each phase 
accumulator output word, of course, assess the approximate size of the angle that 
is being used as a rotation angle (i.e., these three bits show how many octants the 
rotation angle spans), and they are stripped off to control the interchange/negation 
operations that are appropriate for obtaining the desired output samples. Also, the 
third MSB is used, as described previously, to determine whether or not to 
perform a "two' s complement type" inversion of the LSBs. One other operation 
is required by the "Conditional Subtract" module 5508 shown in Fig. 55; in 
addition to stripping off the three MSBs, it appends one MSB having the value 
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zero except in the case where a rotation angle of exactly 7i/4 is required. In that 
case, the appended MSB is one and all other ROM-address bits are zero. 

A special case of the QDDS system, one having only a single output data 
stream, which could be either of the two, but which we call the "cosine-only" case, 
is also useful for various well-known applications. Fig. 56 and Fig. 57 show two 
specializations of the angle rotator circuits previously discussed to implement the 
cosine-only DDS. The system 5600 in FIG. 56 results from specializing the 
angle-rotation system 3900 in FIG. 39. The system 5700 in Fig. 57 is a 
specialization of the angle rotator 4400 in FIG. 44. 

5.9 Conclusion 

Based on the design method discussed, for a given accuracy requirement, 
an architecture with the least amount of hardware is produced by balancing the 
precision of intermediate computations and the complexity of each arithmetic 
block, while keeping the output error within the specified bound. Furthermore, our 
architecture consolidates all operations into a small number of reduced-size 
multipliers. This permits us to take advantage of many efficient techniques that 
have been developed for multiplier implementation, such as Booth encoding, 
thereby yielding a smaller and faster circuit than those previously proposed. 

Simulations and preliminary complexity estimation show that, even 
comparing to the method of (Tan, L. and Samueli, H., IEEE J. Solid-State 
Circuits 30:193-200 (1995)) that is optimized for a 14-bit input angle, our method 
achieved 6 dB more SFDR while using approximately the same number of 
transistors as those needed by (Tan, L. and Samueli, H., IEEE J. Solid-State 
Circuits 30: 193-200 (1995)). In addition, since our structure employs only a small 
ROM, it overcomes the problem of slow access time that occurs when large 
ROMs are used, thereby facilitating a higher data rate. Using the two-stage 
method, when a higher precision is needed, it is very straightforward to satisfy 
such a requirement, since more accurate results can be attained simply by 
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increasing the wordlength and the multiplier size. For the single-stage method, 
however, when high precision is desired, the required lookup table is likely to be 
too large to be practical, particularly for high-speed operation. 

6. Symbol Synchronization for Bursty Transmissions 

5 We have thus far discussed methods that provide efficient implementations 

of the resampler for symbol synchronization in a digital receiver using 
trigonometric interpolation as well as the phase rotator for carrier recovery. To 
produce the correct samples, a timing recovery circuit must supply the resampler 
with symbol timing information, as shown in Figure ID . We will now consider 
10 how this can be accomplished. 

6* 1 Initial Parameter Estimations for Burst Modems 

There are many methods to derive timing information from the received 
signal. According to their topologies, synchronization circuits can be divided into 
two categories: there are feedback and feedforward schemes. Feedback structures 
15 usually have very good tracking performance, and they work quite well in 

continuous mode transmissions. For packet data systems used by third-generation 
mobile communications, where the transmission is bursty, it is essential to acquire 
initial synchronization parameters rapidly from the observation of a short signal- 
segment. 

20 A typical packet format is shown in Figure 5 8 . It includes a short preamble 

5802 followed by user data 5804. The preamble 5802 is a set of known 
modulation symbols added to the user data packet at the transmitter with the 
intention of assisting the receiver in acquisition. 

There are many approaches to burst demodulation, depending on the 

25 specific system requirements. In one approach (S. Gardner, "Burst modem design 

techniques, part 1," Electron, Eng. 77:85-92 (Sept. 1999); Gardner, S., "Burst 
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modem design techniques, part 2," Electron. Eng. 77:75-83 (Dec. 1999)) , the 
receiver first detects the presence of the preamble, using a correlator, whose 
output should produce a large magnitude when the preamble is present. It then 
estimates the symbol timing. If the sampling frequency error is small, the total 
change of the timing phase from the start of the short preamble to the end is 
negligible. Next, it estimates the initial carrier frequency and phase. The above 
steps assume that the impairment caused by the channel is small enough that the 
modem can successfully track the timing carrier phase prior to equalization. 
Otherwise, equalizer training prior to the timing and carrier recovery is needed. 

With a typical preamble of 8 to 32 symbols, depending on the required 
system performance, for QPSK modulation, rapid acquisition is desired. 
Feedforward timing estimation is known to have rapid acquisition, since it 
produces a one-shot estimate instead of tracking the initial timing through a 
feedback loop. 

A well-known method, digital square timing recovery (Oerder M., and 
Meyr, H., IEEE Trans. Comm. 36:605-612 (1988)), has shown rapid acquisition, 
but it requires oversampling of the signal at, typically, four times the symbol rate, 
which imposes a demand for higher processing speed on the subsequent digital 
operations. Moreover, it does not work well for signals employing small excess 
bandwidth. However, pulses with small excess bandwidth are of interest for 
bandwidth-efficient modulation. 

For applications where low power and low complexity are the major 
requirements, such as in personal communications, it is desirable to sample the 
signal at the lowest possible rate and to have the synchronizer be as simple as 
possible. In this section, a synchronizer is proposed that needs just two samples 
per symbol period. In addition, it has been shown to work well for small excess 
bandwidth, which is important for spectral efficiency. Using this method, the 
estimations of the symbol timing and the carrier phase can be carried out 
independently of each other. Hence, they can be carried out in parallel Using the 
proposed structure, the timing and carrier-phase estimators can be implemented 
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efficiently by means of direct computation (instead of a search, as is employed, for 
example, by (Sabel, L., and Cowley, W., "A recursive algorithm for the estimation 
of symbol timing in PSK burst modems," in Proc, Globecom 1992, vol 1 (1992), 
pp. 360-364) using an efficient rectangular-to-polar converter (to be discussed in 
Section 7). This yields a very small computation load. Thus, this structure is well 
suited for low-power, low-complexity and high-data-rate applications, such as 
those in multimedia mobile communications. 

6. 2 Background Information 

The system model 5900 used in developing the symbol timing and carrier 
phase recovery algorithm described in this section is shown in Figure 59. 

Here h(t) is a real-valued, unit-energy square-root Nyquist pulse and w(t) 
is complex white Gaussian noise with independent real and imaginary components, 
each having power spectral density NJ2. 

As mentioned in Section 6.1, a typical data packet for a burst modem 
consists of a short preamble 5802followed by user data 5804. According to the 
approach of (Gardner, S., Electron. Eng. 77:85-92 (Sept. 1999)), the matched 
filter output is sampled every T s = 772 seconds, i.e., at twice the symbol rate. The 
receiver then detects the presence of the preamble in the received signal by 
correlating the conjugate of the known preamble sequence a m , whose length is L, 
with the sampled data xfnTJ as 



The correlator output rjn) should produce a large magnitude \rjn)\ when the 
preamble is encountered. It then estimates the initial synchronization parameters, 
namely the symbol timing and the carrier phase, assuming the transmitter/receiver 
frequency mismatches are insignificant. 

The complex envelope x(t) of the received signal, after the matched 

filter, is 
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*(0 = e J& X a k g(t -kT-r)+ v(f) (6.2) 
where {a k } is a sequence of independent equally-probable symbols with 
E[\a k | 2 ] = 1 . We also have that v(t) = w(t)% h(-t) and that g(t) = h(t)® h(-t) 

is a Nyquist pulse. The time delay x and the carrier phase 8 are both unknown. 

To estimate the data sequence a k we want sample values of x(t) at 
t = mT+ t, with m an integer, whereas only the samples xfnTJ are available after 
sampling x(t) by a fixed clock. 

Now let us examine how the correlator output relates to symbol timing and 
carrier phase. Inserting (6.2) into (6. 1) yields 

= S E WW + 2mT s - kT- r)e^ 9 

*— " (6.3) 

+ T t a m v(riT t + 2mT a ). 

m=0 

Since the data are independent, and they are independent of the noise, we have 



fl k-m 
L m kX [0 A: * m 

E[a m v(nT s + 2mT s )]=0. (6.5) 
According to (6.4) and (6.5), and because T = 27; the expectation of rjn) with 
1 5 respect to the data and the noise is (for simplicity, we omit the constant real scale 

factor L) 

£[r«(#i)Ws(«Z;-r)- < 66 > 
Thus, the mean value of the complex preamble correlator output actually 

equals the sample of the delayed signaling pulse g(t), with delay being x, rotated 
20 by the angle 0. This is shown in Figure 60 for 0 = 0, where g(t) is a raised cosine 

Nyquist pulse with a = 0.35. The total timing delay x can be expressed as 

+ (6.7) 
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where the integer n 0 represents the portion of x that corresponds to an integer 
multiple of the sampling interval T s and 0 < u < T M is the sampling time mismatch. 

Most practical signaling pulses g(t) are symmetrical and their peak value 
occurs at g(0). If 0 is known, using these properties, we can estimate the 
sampling time mismatch u from the correlator output rjn) . In the next section 
we will discuss such an algorithm. We will derive this algorithm by first assuming 
that 0 = 0. Then we will discuss how the method can be carried out independently 
of the carrier phase. Simultaneously, we also derive a phase estimation algorithm 
that is independent of the symbol timing. 

6. 3 Symbol Timing Estimation Assuming 6=0 

From (6.6), with 0 = 0, we have 

E if xx (n)]= g(nT s -T). (6.8) 
According to (6.7) and (6.8), if the transmission delay x is exactly an integer 
multiple of T s we must have u = 0, and thus rjn,) must correspond to the peak 
g(0). Otherwise, we have u*0, vnthrjhj and rjn 0 +l) being the two correlator 
output values nearest the peak value g(0), as shown in Figure 60. That is, rjn^) 
and r xx(n 0 +l) must be the two largest correlator outputs. Therefore, once the 
largest correlator output is located, we can obtain » 0 , the integer part of x. 

We now turn to finding u. Without loss of generality, let us assume T s = 1 . 
Replacing n by n 0 +n we have, according to (6.8) and (6.7), 

£|>» ( n o + «)] = g(O 0 + n)-r)= g(n - p). (6.9) 
For simplicity in our discussion on finding the fractional delay u, we henceforth 
drop the index « 0 , which corresponds to an integer multiple of sample delays, from 
our notation. Next we define R(e? a ) as the Fourier transform of rjn): 

00 

*(0= Z rJlW. (6.10) 

n=-oo 
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The expectation ofR(e' u ) can be expressed as 

E[R(e J< °)] = E[FT(r„(n))]= m^WD- ( 6 n ) 

According to (6.9), and (C.4) in Appendix C, we have 

E[R(e j6 >)] = FT(g(n- ju)) = e J6>M G(e jG> ) (6.12) 
where G(0 is the Fourier transform of g(ri). Since g(n) is symmetrical, GO?"") 
must have zero phase. Thus, according to (6.12), 

mg(E[R(e Jm )]) = arg(e^G(e^)) = a>M- (613) 
Evaluating (6. 13) at oj = nl2, we can obtain an estimate of \i as 

//=-arg(i?(^ /2 )). (6.14) 

Therefore, the unknown sampling mismatch \i can be obtained by taking the 
Fourier transform of rjri) and evaluating the phase corresponding to o = tc/2. 

To make the implementation of (6.14) realistic, we should truncate the 
sequence rjn) before taking its Fourier transform. For example, using only the 
four samples ^(-1), ^(0), r^(l), and rJ2\ we have 

Rrifi 3 " 2 ) = [/•«(<))- ^(2)] + Ar^-l)- r«(l)]. (6.15) 
Using the correlator output, the \i value can be obtained by first computing 

R T {e jn/2 ) according to (6.15), and then from the following: 

ju=-aig(R T (e J * /2 )). (6.16) 
n 

For low precision requirements, this operation can be accomplished using a small 
CORDIC processor (Chen, A, et al, "Modified CORDIC demodulator 
implementation for digital IF-sampled receiver," in Proc. Globecom 1995, vol 2 
(Nov. 1995), pp. 1450-1454) or a ROM lookup table (Boutin, N., IEEE Trans. 
Consumer Electron, CE-38.5-9 (1992)). With high accuracy requirements, 
however, the CORDIC processor will have long delays, while the table-lookup 
method will certainly require a very large ROM. In this case, we propose to use 
the rectangular-to-polar converter which will be discussed in Section 7. This 
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rectangular-to-polar converter requires two small ROMs and it consolidates the 
operations into small array-multipliers, which can yield a smaller and faster circuit 
using well-known efficient multiplier implementation techniques. 

A synchronizer 6100 for implementing the synchronization scheme 
described above is illustrated in Figure 6L The synchronizer 6100 includes a 
correlator 6102, a Fourier Transform module 6104, and a rectangular-to-polar 
converter 6 1 06 . The Fourier transform module 6 1 04 includes various delays and 
adders that are known to those skilled in the arts. The rectangular-to-polar 
converter is described further in Section 7. 

The synchronizer 6100 receives data samples associated with sampling one 
or more received symbols and determines an offset tz\i/2 9 where \i represents a 
synchronization offset of the data samples relative to the incoming symbols. The 
operation of synchronizer 6100 is described in reference to the flowchart 6200, as 
follows. 

In step 6202, a set of complex data samples is received. 

In step 6204, the correlator 6 1 02 correlates the complex data samples with 
a complex conjugate of a preamble data set (a m *), resulting in correlated complex 
data samples. 

In step 6206, the Fourier transform module 6104 determines the Fourier 
transform of the correlated data samples signal, according to equations (6.10) - 
(6.13) and related equations; 

In step 6208, the Fourier transform module 6104 evaluates the Fourier 
transform of the correlated data samples at rc/2, generating a complex signal 
representing a complex number; 

In step 6210, the rectangular-to-polar converter 6106 determines an angle 
in a complex plane associated with the complex number of step 6210, where the 
angle represents synchronization between the data samples and the incoming 
symbols. 

In step 6212, the angle from step 6210 is scaled by 2/tc to determine the 
synchronization offset. 
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6.4 Bias in Symbol Timing Estimation due to Truncating the 
Sequence 

By truncating the sequence rjn) before taking the Fourier transform, we 
have produced a very simple structure to compute \i. However, since Rj(^) 
5 differs from we must determine how this difference would affect the 

estimated \i value. The truncated sequence r^n) is related to the original 
sequence rjiti) as 

r T {n) = rJnMn) (6.17) 
where w(n) is a rectangular function whose Fourier transform is a sine 

10 function. Thus, 

R T (e JG >) = R(e jQ} )® w(e jG> ). (6.18) 
Taking the expectation of (6. 18) we have 

(6i9) 

= E[R(e J ")]® W(e j "). 
Obviously, the n value obtained using i^e 7 ") in (6.16) would be different from 
1 5 that obtained usingi^"). This will introduce a non-zero timing-jitter mean (bias) 

to the \i value obtained using JR^O instead ofR(ef 6f ). But the phase difference 
of the expected values of R^ 2 ) and can be computed for a given g(t). 

The procedure is as follows: 

1. Given the pulse waveform g(t), obtain, for each value \i 9 the 
20 samples g(n - |i), n = -1, 2. 

2. Compute R^ 2 ) using these samples g(n - x) according to (6. 1 5). 

3 . Find the value ft according to (6. 1 6). The difference between the 
desired value ji and the value ft computed using finite samples 
g(n - n = -1, 2, is the bias. 

25 This bias is illustrated in Figure 63, where g(t) is a raised cosine Nyquist pulse 

with rolloff factor a = 0. 1 . 
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From Figure 63, the bias is a function of \i and it can be precalculated and 
stored in a ROM in the receiver. Although the size of the ROM depends on the 
precision of \i, for typical precision requirements on \i the ROM can be quite 
small. Let us illustrate this point using an example: If an 8-bit accuracy is desired 
for the bias, the bias value corresponding to the three most significant bits (MSBs) 
in n is indistinguishable from that corresponding to the full-precision \i value. 
Hence, we can use only the 3 MSBs in \l to determine the bias, thereby needing 
only 8 words in the bias lookup table. 

Thus, for each of our symbol timing detector output samples, we can 
obtain the corresponding bias value from the ROM table, then subtract this bias 
from the original timing estimate to obtain an unbiased estimate. 

We have thus far restricted our discussion to the timing recovery algorithm 
for 6 = 0. We now consider how this algorithm can be made to accommodate an 
arbitrary carrier phase 0. 

6. 5 Carrier-Independent Symbol Timing Recovery 

According to (6.6), with the T s = 1 normalization, the complex correlator 
output rjn) is dependent on 0. Although the expectation of its magnitude 



does not depend on 0, it is non-trivial to compute the magnitude ofrjri) from its 
real and imaginary components. Expressing rjn) in terms of its real and 
imaginary components, according to (6.6), we have 




(6.20) 



£ M*)]= g( n ~ //)cos£+ Jg(n- //)sin#. 



(6.21) 



Thus, 




(6.23) 



(6.22) 
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Since the carrier phase 0 does not depend on \x we can treat it as a 
constant scale factor in Relr^Qi)] and/wfr^w)] when we are only concerned with 
extracting the timing information. 

Clearly, therefore, instead of using the magnitude of the complex rjji) 
5 value, we can use one of its real and imaginary parts, which are available at the 

output of the preamble correlator. 

We, of course, must decide which of Re[rj(ny] and Im\r xx {ri)'\ to use. If 
the unknown phase 6 is such that cos0 » 0 it is certainly desirable to use /wi [r«(ji)] 
instead of RelrJ^n)], and vise versa. But we don't know the 9 value thus far. 
10 How do we decide which one to use? 

From (6.22) and (6.23) we can see that the relative magnitudes of cos0 
and sin0 can be obtained from the real and imaginary components of rjin). For 
example, tf\Re[rJn)]\ > \Im[rJji)]\ we certainly have that |cos0| > |sin0|, thus we 
should use the real part of the correlator output to find \x. Henceforth we denote 
15 the appropriate (real or imaginary) part of r^ri) by rjjt). 



6. 6 Carrier Phase Computation 



Next, let us examine the problem of extracting the carrier phase. From 
(6.6) we can see that the phase of the complex number E[r,Jn)] does not depend 
on ji. Moreover, the carrier phase can simply be obtained by extracting the phase 
20 of rjin). In order to achieve the best precision, it is desirable to choose the rjji) 

value with the largest magnitude for carrier phase estimation. For example, if 
r xxi n o) is the correlator output with largest squared-magnitude, we choose r 3 Jjt^ 
to compute 

25 One advantage of this approach is that the symbol timing and carrier phase 

estimations are independent of each other. They can thus be carried out in parallel. 
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As for symbol timing estimation in (6.16), the computation in (6.24) can 
be accomplished efficiently using the rectangular-to-polar converter to be 
discussed in Section 7. 

A synchronizer 6400 for determining timing and phase offsets is shown in 
FIG. 64. Similar to synchronizer 6100, the synchronizer 6400 receives data 
samples associated with sampling one or more received symbols and determines 
a timing offset 7t|i/2, where \i represents a synchronization offset between the 
data samples and the incoming symbols. Additionally, the synchronizer 6400 
determines a carrier phase offset represented by 0. The synchronizer 6400 includes 
the correlator 6102, sample selectors 6404 and 6406, the Fourier transform 
module 6104, and two rectangular-to-polar converters 6106. The operation of 
synchronizer 6400 is described in reference to the flowchart 6500 in FIGs. 65 A-B, 
as follows. The order of the steps in flowchart 6500 is not limiting, as one or more 
steps can be performed simultaneously or in a different order, as will be 
understood by those skilled in the relevant arts. 

In step 6502, a set of complex data samples is received. 

In step 6504, the correlator 61 02correlates the complex data samples with 
a complex conjugate of a preamble data set (O, resulting in correlated complex 
data samples. Each correlated complex data sample includes a real sample and an 
imaginary sample. 

In step 6506, the sample set selector 6404 selects either the set of real 
correlated samples or the set of imaginary correlated samples. In embodiments, the 
set with the larger magnitude is selected. 

In step 6508, the Fourier transform module 6104 determines the Fourier 
transform of the selected real or imaginary data samples, according to equations 
(6.10) -(6.13) and related equations; 

In step 6510, the Fourier transform module 6104 evaluates the Fourier 
transform at tt/2, generating a complex signal representing a complex number; 

In step 65 1 2, the rectangular-to-polar converter 6 1 06a determines an angle 
in a complex plane associated with the complex number of step 6510, where the 
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angle represents synchronization between the data samples and the incoming 
symbols. 

In step 6514, the angle from step 6512 is scaled by 2/tt to determine the 
synchronization offset. 

5 In step 6516, the selector 6406 selects the largest correlator complex 

output. This selection can be based on an examination of one of the parts (real, 
imaginary) of the data sequence. 

In step 65 1 8, the rectangular-to-polar converter 6 1 06b determines an angle 
in a complex plane associated with complex output of step 6516, where the angle 

10 represents the carrier phase offset 0. 

6. 7 Simulation Result 

We have used the above procedures to estimate the timing delay and the 
carrier phase of binary PAM symbols. The pulse shape was raised cosine with 
rolloff factor a = 0.4. The block size was L = 32 preamble symbols. To 
15 demonstrate its performance for signals with small excess bandwidth, we also 

tested this method with a = 0. 1 . For a carrier phase offset 0 = 45°, we ran the 
simulation repeatedly using the synchronizer 6400, each time using a value 
randomly chosen between 0 and 1. 

In addition to synchronizer 6400, we have also used the following two 
20 well-known methods to estimate the sampling mismatch: 

1 ) the DFT-based square-timing recovery (Oerder M. , and Meyr, H. , 
IEEE Trans. Comm. 56:605-612 (1988)), 

2) the method of (Gardner, S., Electron. Eng. 77:75-83 (Dec. 
1999)) that maps rjn 0 + \)irjn 0 )— the ratio of the two 

25 correlation values nearest the peak (see Figure 60) — to the 

sampling mismatch value \i. 
The variances of the timing jitter using these estimation methods for 
a = 0.4 and a = 0.1 are plotted in Figure 66 and Figure 67, respectively. The 
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corresponding Cramer-Rao bounds (CRB) — the theoretical lower bounds of 
estimation errors (Meyr, H., et aL, Digital Communication Receivers: 
Synchronization, Channel Estimation and Signal Processing, Wiley, New York, 
NY (1998)) — are also shown. We can see that, in both cases, the timing- jitter 
5 variance using the proposed synchronizer is quite close to the theoretical bound. 

It clearly outperforms the other two methods, even for signals employing small 
excess bandwidth, as seen in Figure 67. 

The variance of the phase estimation error is depicted in Figure 68. It 
shows that, using the proposed method, the phase estimation error agrees quite 
1 0 well with the theoretical bound. 

6.8 Conclusion 

A synchronizer for initial symbol timing and carrier phase estimation using 
preambles has been presented. This synchronizer requires just two samples per 
symbol. Since the two estimations are independent of each other, they can be 

15 carried out simultaneously. These characteristics would ease the demand for 

computational speed for high-data-rate applications. Moreover, this synchronizer 
has demonstrated very good timing estimation performance even for signals with 
small excess bandwidth, which is essential for bandwidth efficient communications. 
The parameter estimations can be implemented very efficiently using the 

20 synchronizer 6400. Due to its simplicity, this method is attractive for applications 

where low power and low complexity are desired, such as in a hand-held 
transceiver. 

7. A High-Speed Processor for Rectangular-to-Polar Conversion 

25 As discussed previously, the rapid acquisition characteristic of feedforward 

symbol synchronizers is essential to symbol synchronization for burst modems. 
Many feedforward structures require the evaluation of the phase of a complex 
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number. That is, an efficient implementation of the phase extraction process is 
crucial. In order to handle a wide range of communications problems (Section 8), 
a general rectangular-to-polar conversion problem is considered. 

There are several well-known implementations for a rectangular to polar 
coordinate conversion, i.e. obtaining the magnitude and phase of a complex 
number. One method uses a ROM lookup table with both the real and imaginary 
components as input. This is practical only for low bit-accuracy requirements, as 
the ROM size grows exponentially with an increasing number of input bits. To 
reduce the ROM size, we can first divide the imaginary by the real component, 
then use the quotient to index the lookup table. But the hardware for a full-speed 
divider is very complicated and power consuming. An iterative divider 
implemented using shifting and subtraction requires less hardware, but it is usually 
quite slow. Recently, CORDIC has been applied in this coordinate conversion 
(Chen, A., and Yang, S., "Reduced complexity CORDIC demodulator 
implementation for D- AMPS and digital IF-sampled receiver," mProc. Globecom 
1998, vol 3 (1998), pp. 1491-1496). However, due to the sequential nature of 
CORDIC, it is difficult to pipeline, thus limiting the throughput rate. 

In burst-mode communication systems, rapid carrier and clock 
synchronization is crucial (Andronico, M M et al, "A new algorithm for fast 
synchronization in a burst mode PSK demodulator," in Proa 1995 IEEE Int. 
Conf. Comm., vol 3 (June 1995), pp. 1641-1646). Therefore, a fast rectangular- 
to-polar conversion is desired. In this section, we present an apparatus and method 
that implements the angle computation for rectangular-to-polar conversion with 
low latency and low hardware cost. This processor and the polar-to-rectangular 
processor presented in Section 5 (See rotator 3900 in FIG. 39), together, can 
perform the M-ary PSK modulation devised in (Critchlow, D., "The design and 
simulation of a modulatable direct digital synthesizer with non-iterative coordinate 
transformation and noise shaping filter," M. S. thesis, University of California, San 
Diego (1989)). 
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7. 1 Partitioning the Angle 

Figure 69 displays a point in the Cartesian X-Y plane having coordinates 
(X 0 , Y q ), wherein X 0 and Y 0 represent the real and imaginary parts of an input 
complex signal. The angle (p can be computed as 

^=tan- 1 (7 0 /JT 0 ). (7.1) 
In deriving the core of our algorithm, we assume the dividend and divisor satisfy 

X, > Y 0 > 0. (7.2) 
We will discuss how to extend the result to arbitrary values in Section 7.4. To 
achieve the highest precision for given hardware, the inputs X Q and Y Q should be 
scaled such that 

1 < X 0 < 2, (7.3) 
A straightforward method for fast implementation of (7. 1) can be devised 
as follows: 

1) Obtain the reciprocal of X 0 from a lookup table. 

2) Compute Y 0 x (l/X 0 ) with a fast multiplier. 

3) Use this product to index an arctangent table for (p. 
However, the size of the two tables grows exponentially with increased precision 
requirements on <p, and rather large tables would be required to achieve accurate 
results. Therefore, for high-precision applications, such an implementation seems 
impractical. 

If we approximate l/X 0 by the reciprocal of the most significant bits 
(MSBs) of X Q , denoted by [X 0 ], then the required reciprocal table is much smaller. 
We can then multiply the table output by Y 0 to yield YJ[X^[ 9 which is an 
approximation of Y 0 1 X 0 . This quotient can then be used to index an arctangent 
table. Similar to the reciprocal table, a much smaller arctangent table is needed 
if we use only the MSBs of YJ[X^[ 9 denoted by [iy[J*T 0 ]], to address the table, 
which returns (pj = tan _1 ( [Y 0 I [X 0 J] ). Obviously, this result is just an 
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approximation to (p. We will subsequently refer to the computation of (p x as the 
coarse computation stage. 

Let <p 2 be the difference between <p and cp^ Using the trigonometric 

identity 

tan <p 2 = tan (<p - (p x ) = (tan <p - tan <p x ) / (1 + tan <p * tan <p y ) (7.4) 



10 



15 



and the definitions tan <p = Yq/X q and tan <p x = [IV[X 0 ]] 5 we have 



tm<p 2 = 



Y 0 - X 0 x 




X 0 + 7 0 x 





(7.5) 



Using this relationship, <p 2 can be determined from [1V[AT 0 ]], the coarse 
computation results. Therefore, the desired result <p can be obtained by adding the 
fine correction angle q> 2 to the coarse approximation q^. This procedure of finding 
<p 2 will subsequently be referred to as the fine computation stage. 

By partitioning the computation of (7. 1) into two stages, the table size in 
the coarse stage can be reduced significantly at the expense of additional 
computations, which are handled by the fine stage. Let us now examine the 
complexity of the fine stage. To find (p 2 , we can first compute 



X^X 0 +7 9 x [Y 0 /[X 0 ]] 

1 (76) 

and then find q> 2 as 

<p 2 = tan- 1 ^/^). (7-7) 
The computation in (7.6) involves only adders and multipliers, while (7.7) 
20 requires lookup tables. Moreover, it seems we can't use the same coarse-stage 

tables because they have low resolution and thus can't satisfy the high precision 
requirements for the fine angle <p 2 . Now let us analyze <p 2 to see if there is any 
property that can help in this situation. 
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If <Pi is a good approximation of (p, then <p 2 = <p - <p x is close to zero. In 
view of (7.7), 7^ should be very small too. This property helps us in two 
respects: 1) The difference between Y X IX X and Y X /[X X ] is much smaller than that 
between \IX X and l/[X x ], This suggests that if we use the same low resolution 
reciprocal table as in the coarse stage, the error contributed to the final result will 
be very small. We will demonstrate this in the next section. 2) If Y x /X x is 
sufficiently small to satisfy 

Y x /X x \ = |tan^ 2 |< 2~ Nn (7.8) 
where denotes the desired number of bits in <p, then 



<p 2 = tan" 1 ^ /X x )xY x /X x (7.9) 
and we can compute <p 2 without using an arctangent table. This is explained as 
follows: 

From the Taylor expansion of tan'^/A^) near Y 1 /X 1 = 0, we obtain 

tan' 1 ^ / X x ) = Y X IX X - (y x I X$ 1 3 + o((y x I X$). (7.10) 
15 Since 0((Y x /X x ) $ ) is negligible in comparison to (Y x /X x ) 3 /3, it can be omitted. 

Therefore, ifY x /X x is used to approximate tzn\Y x IX x \ an error 

A tan = tan' 1 (Y x /X X )~Y X /X X = - (y x I X x )" 1 3 (7.11) 
will occur. However, according to (7.8), A^ is bounded by 

A J< 2"^/3 (7.12) 
20 which is very small. This indicates that the approximation (7.9) is quite accurate 

if (7. 8) is satisfied. 

From the above analysis, no additional tables are needed for the fine stage 
if <Pi is sufficiently close to <p. On the other hand, the better that <p x approximates 
(p, the larger the tables required for its computation become. As mentioned 
25 previously, table size grows exponentially as the precision increases. A good 

trade-off is obtained when the result <p x of the coarse stage is just close enough to 
(p that (7.8) is satisfied, thereby eliminating the additional tables in the fine stage. 
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A detailed description of a rectangular-to-polar converter that implements the 
algorithm follows. 

FIG. 71 illustrates a rectangular-to-polar converter 7100 that implements 
the coarse and fine rotation described in section 7 herein, including equation (7.1)- 
5 (7.53). The converter 7100 receives a complex input signal 7102 (that represents 

a complex number having Xq and Y 0 components) and determines the angle q>, 
which represents the position of the complex signal 7102 in the complex plane. In 
doing so, the converter 7100 determines a coarse angle computation that is 
represented by the angle cp l3 and performs a fine angle computation represented 

10 by the angle cp 2 . Once <p x is determined, the input complex number 7102 is 

conceptually rotated back toward the X-axis to an intermediate complex signal 
71 15 as represented in FIG. 72, and <p 2 is determined from intermediate complex 
signal 7115. The angles <Pj and (p 2 are added together to determine <p. 

The converter 7100 includes: an input mux 7104, reciprocal ROM 7106, 

15 output demux 7108, an arctan ROM 71 10, a multiplier 71 12, a butterfly circuit 

71 14, a scaling shifter 7116, a fine angle computation stage 7124, and an adder 
7126. The fine angle computation includes a multiplier set 7118, a one's 
complementer 7120, and a multiplier 7122. 

The ROM 7106 stores reciprocal values of [Xq], wherein [Xq] is defined 

20 as the most significant bits (MSB) of Xq of the input signal 7102. The reciprocal 

of [Xq] is represented as Z 0 , for ease of reference. As will be shown, the ROM 
7106 is re-used to determine the reciprocal of [XJ, where X x is the real part of the 
intermediate complex number 71 15 shown in FIG 71 and FIG. 72. The reciprocal 
of [XJ is represented as Z l9 for ease of reference. In embodiments, the ROM 7106 

25 has 2 M3+1 storage spaces, where Nis the number of bits that represents Xq (and Y 0 ) 

of the input signal 7102. 

The input mux 7104 chooses between [Xq] and [X x ] as an input to the 
reciprocal ROM 7106, according to the control 7128. The output demux 7108 
couples an output of the ROM 7106 to Z 0 or Z x according to the control 7128. 
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The control 7128 assures that Z 0 receives the stored reciprocal value for [Xq], 
and that Z x receives the stored reciprocal value for [XJ. 

The arctan ROM 71 10 stores the coarse approximation angle cp x based on 
a [YoZq] input. Therefore, a coarse stage can be described as including the ROM 
5 ROM 7110, the ROM 7106, and the multiplier 7112, as they are used in the 

coarse angle computation. 

The operation of the converter 7 1 00 is described further with reference to 
the flowchart 7300, as follows. The order of the steps in the flowchart 7300 is not 
limiting as one or more of the steps can be performed simultaneously, or in 
10 different order. 

In step 7302, the input complex signal 7102 having a Xq component and 
a Y 0 component is received. In embodiments, the Xq and Y 0 components are N-bit 
binary numbers. 

In step 7304, the control 7128 causes Z 0 to be retrieved from the ROM 
15 7106, where Z 0 represents 1/py, and wherein [XJ is the MSBs of Xq. 

In step 7306, the multiplier 7112 multiplies Y 0 of the input complex 
number 7 1 02 by Z 0 , resulting in a [Z 0 Y 0 ] component. The [Z 0 Y 0 ] component is an 
approximation of Yq/Xq 

In step 7308, the coarse angle <p x is retrieved from the ROM 7110 based 
20 on [Z 0 Y 0 ], and is sent to the adder 7126. Note that the coarse stage can be 

described as including the ROM 7110, the ROM 7106, and the multiplier 7112, 
as they are used in the coarse angle computation. 

In step 73 10, the butterfly circuit 7114 multiplies the input complex signal 
7102 by [Z 0 Y 0 ]. This causes the input complex signal 7102 to be rotated in the 
25 complex plane toward the real axis to produce the intermediate complex signal 

7115 (representing an intermediate complex number), having a real X x component 
and an imaginary Y x component. 

In step 73 12, the scaler 7116 scales the X x component of the intermediate 
complex signal so that it is compatible with the reciprocal values stored in the 
30 ROM 7106. The scaler also scales the Y 1 component by the same amount. 
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In step 73 14, the control 7128 causes Z x to be retrieved from the ROM 
7106 based on [XJ, where Z^epresents 1/tXJ, and wherein [XJ is the MSBs of 
X r Note, that the ROM 7106 is efficiently used twice to calculate two different 
reciprocals Z 0 and Z x , thereby reducing overall memory size. 
5 In step 73 16, the fine angle computation stage 7124 determines the fine 

angle <p 2 based on Z x and the scaled intermediate complex number 7 1 1 5. In doing 
so the Newton- Raphson method is emulated in hardware to estimate cp 2 , which 
is the arctan of ZiY r More specifically, multiplier set 71 18 multiples X^Yj by Z v 
The ones' (approximating two's) complement 7120 is then determined for X X Z V 
10 After which, the multiplier 7127 multiplies (2-X^) by Y^, to determine tan 

<p 2 Since (p 2 is a small angle, the value tan <p 2 is used as an approximation of <p 2 . 

In step 73 16, the (p x and <p 2 are added together to get cp. 

A more detailed description of the algorithm follows. 



7. 2 The Two-Stage Algorithm 



15 In this section we first analyze how the coarse approximation error cp 2 = 

<p - cp x depends upon the precision of the tables 7106 and 7110, in order to 
determine the amount of hardware that must be allocated to the coarse stage. 
Next we explore ways to simplify the computations in the fine stage. 



7.2.1 Simplification in the Coarse Computation Stage 



20 The main concern in the coarse stage design is how the lookup table values 

are generated to produce as precise results as possible for a given table size. As 
mentioned previously, there are two lookup tables: 
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7.2. LI The Reciprocal Table 7106 

The input to this table, 1 ^ X 0 < 2, can be expressed as 

. . . . X m . . . X N (7.13) 
where only bits x x through x m are used to index the table. To generate the table 
value, if we merely truncate X 0 as 

[x o ] = hx 1 x 2 ...x m (7.14) 
then the quantization error A Xq = X Q - [^ 0 ] ls bounded by 



0< A Xq < 2~ m . (7.15) 



Thus, the difference between the table value and l/X c 



10 l/X o -l/[X 0 ]=([x o ]-X 0 )/([x o ]X 0 )*-h X0 /XS (7.16) 
is bounded by 

-2~ m < 1/ X 0 -l/[x o ]< 0. (7.17) 
But if we generate the table value corresponding to 

[X 0 ]= lx x X 2 ...x m l (7.18) 
15 with a bit "1" appended as the LSB, then the quantization error in (7.15) is 

centered around zero: 

-2""" 1 < A Xq < 2""" 1 (7.19) 
hence, the error in the reciprocal is also centered around zero: 

-2-"- 1 <l/ X 0 -1/[X 0 ]< 2~ m -\ (7.20) 
20 Comparing (7.20) to (7.17), the maximum absolute error is reduced. This is the 

technique introduced in (Fowler, D.L., and Smith, J.E., "An accurate high speed 
implementation of division by reciprocal approximation," in Proc. 9th Symp. on 
Computer Arithmetic (1989), pp. 60-67). 
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Since the output of the table will be multiplied by 7 0 , the fewer the bits in 
the table value, the smaller the required multiplier hardware. Let the table value 
Z 0 be generated by rounding l/[X 0 ] to n bits: 

Z 0 = 0.1z 2 z 3 ...z K . (7.21) 
The quantization error A Zq = 1 / [x o ] - Z 0 is then bounded by 

-2-"- 1 < A Zq < 2~ n -\ (7.22) 
Once we have obtained Z 0 from the reciprocal table, we can get an 
approximation to the quotient YJX 0 by computing YqZ 0 . This result is then used 
to address the arctangent table for cp^ 



7.2.1.2 The Arctangent Table 7110 

In order to use a very small table, Y Q Z 0 is rounded to k bits to the right of 
the radix point to become [Y Q Z 0 ], with the rounding error bounded by 

-2-*- 1 < A YqZq = Y 0 Z 0 - [Y 0 Z 0 ]< 2~ k -\ (7.23) 
Then, [Y 0 Z 0 ] is used to index the arctangent table, which returns the coarse angle 
(p^tan^yoZo]). 

Now we must determine the minimum tw, n and k values such that (7.8) is 
satisfied. First, let us examine X x and Y x which are computed using [Y 0 Z 0 ] as 

X x = X 0 + Y 0 [Y 0 Z 0 ] (7.24) 

Y 1 = Y 0 - X 0 [Y 0 Z Q ] (7.25) 
Dividing (7.25) by (7.24), and then dividing both the numerator and 
denominator by X Q9 we have 
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15 



7, / x x \ = \(y 0 I x 0 - [r 0 z 0 ])/(i+ (y o / x 0 Iy o z q ])\ 



(7.26) 



<\y 0 /x 0 -[y 0 z 0 \ 

The inequality is true because X 0 z Y 0 ;> 0 and [YoZ 0 ] * 0- Taking into account all 
the quantization errors in (7.20), (7.22) and (7.23), we can express YJX Q in terms 
of [7 0 Z 0 ] as 

7 0 (1/X 0 )«7 0 (l/[X 0 ]-A^ o /X 0 2 ) 

= 7 0 ((z o+ A Zo )-A^ o /Z 0 2 ) 

= 7 0 Z 0 + 7 0 A Zo -7 0 (A^ o /X 0 2 ) 

= [7 0 Z 0 ] + A 7oZo + 7 0 A Zo -7 0 (a Xo /X 0 2 ). 
Substituting this result into (7.26) 5 we have 

\YJ x\<\^ z ^Y^ Zo -Y^ x J Xl\. 
Since 7 0 (A x<> I X 2 ) = (y o I X 0 )(a Xa I X 0 ) , from (7.2) and (7.19), 



(7.27) 



(7.28) 



-2- m -'<Y^ x JXl)<2- m -\ 
Also, according to (7.2) and (7.22), we have 



(7.29) 



-2-<7 0 A Zo <2-". 



(7.30) 

Applying (7.23), (7.29) and (7.30) to (7.28), we obtain |iy.Xi| < 2* 1 + 2"" + T k -\ 
If we choose m>NI3 + \, n*N/3+2andk>N/3 + l, then 

7 1 /X 1 |<0.75x2-" /3 . (7.31) 
Therefore, since the inputs X l and 7 X to the fine stage satisfy (7.8), no additional 
tables are needed for the fine stage. Henceforth we choose m = N I 3 + 1, 
» = JV73 + 2and£ = 7V7 3 + 1. 
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7.2.2 Hardware Reduction in the Fine Computation Stage 
7124 



Since (7.8) is satisfied, we can obtain the fine angle c|> 2 by computing the 
quotient Y x /X x . From (7.24), we have X 0 < X x < X 0 + 7 0 , hence 1 < X x < 4. In 
5 order to use the same reciprocal table as in the coarse stage, X x should be scaled 

such that 

1<X X <2. (7.32) 
This can be satisfied by shifting X x to the right if X x > 2. Of course Y x should also 

be shifted accordingly so that Y l /X l remains unchanged. 

10 As in the coarse stage, the reciprocal table accepts iV7 3 + 1 MSBs ofX x 

and returns Z v We define the reciprocal error 6 X = 1/X X - Z x . Since the same 

reciprocal table is used as in the coarse stage, b x and 8 0 must have the same 

bound. Since 

S l = S 0 = 1/ X 0 - Z 0 = 1/ X 0 - 1/[X 0 ]+ A Zo (7.33) 
15 we can use (7.20) and (7.22) to obtain 

-0/75x 2" jV/3_1 < S x < 0.75 x T Nn ~\ (7.34) 
The bound on 7 X can be found using (7.3 1) and (7.32): 

|7 X |< 0.75x2~* /3+1 . (7.35) 
Now we can obtain the final error bound in approximating X X IY X by Y x Z l9 
20 according to (7.34) and (7.35), as 

\YJX X - 7^1 = 1 Y X S X \< (0J5) 2 x 2~ 2NI \ (7.36) 
Clearly, this approximation error is too large. To reduce the maximum 
error below 2 _Ar , the bound on |5 X | should be approximately 2 _2M3 , which would 
require the reciprocal table to accept 2M3 bits as input. That is, the table needed 
25 for such a high-resolution input would be significantly larger than the one already 

employed by the coarse stage. 

To overcome this difficulty, we can apply the Newton-Raphson iteration 
method (Koren, I., Computer Arithmetic Algorithms, Prentice Hall, Englewood 
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Cliffs, NJ (1993)) to reduce the initial approximation error b x . We now briefly 
explain how this method works. First, let us define the following function: 

f(Z)=\IZ-X v (7.37) 
Obviously, we can obtain Z = \IX X by solving/Z) = 0. In other words, we can find 
\IX X by searching for the Z value such that/Z) intersects the Z-axis, as shown in 
Figure 70A. 

Shifting the Z-axis down by X x , we obtain a new function f x (Z) = — , 
shown in Figure 70B. At Z x = l/[X x ] , Z x being the initial guess, the slope of 
f x {Z)=\IZ is 

/i , ( z i)=-^r- (738) 

The tangent, shown as the dashed line 7102, intersects the/ x (Z) =X X line at a new 
point Z 2 . From Figure 70B, Z 2 is much closer to the desired value \IX X than the 
initial guess Z x . Let us now find Z 2 . According to Figure 7-3, we must have 

— L = - — . (7.39) 

Z 2 - Z x 

Expressing Z 2 in terms of Z x and X x we have 

Z 2 = Z 1 (2-X 1 Z 1 ). (7.40) 
Thus, we can obtain Z 2 , a more accurate approximation of l/X u from Z x . One may 

wonder how accurate Z 2 is in approximating l/X v Let us now examine the 

approximation error bound. 

Substituting Z x = l/X x -& x into (7.40), we have 

Z 2 ^{\IX X -S X ){2-X X {\IX^8 X )) 
= \l X X -X X 8 X . 

According to (7.32), (7.34) and (7.35), after one Newton-Raphson iteration, the 
error in Y X Z 2 is reduced to 

\Y X I X x - Y X Z 2 \=\Y X X X S?\< (0.75) 3 x 2~ N . (7.42) 
Thus, a rather accurate result is obtained with just one iteration. 
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Finally, the fine angle can be computed by multiplying Z 2 by Y{. 

<p i »Y x Z 2 = Y l Z l {2-X x Z x ). (7.43) 
Although there are three multipliers involved in (7.43), the size of these 
multipliers can be reduced with just a slight accuracy loss by truncating the data 
before feeding them to the multipliers. The computational procedure of (7.43) is 
as follows: 

1) The inputs to the fine stage, X x and Y u are truncated to 2M3+2 and 
N+3 bits to the right of their radix points, respectively. Since the N/3-1 MSBs in 
Y x are just sign bits, as indicated by (7.35), they do not influence the complexity 
of the multiplier that produces Y X Z X . The corresponding quantization errors are 
bounded by 

0< <2" 2JV/3 " 2 (7.44) 
0< A Y < 2~ N ~ 3 (7.45) 

2) Both quantized X x and Y x are multiplied by Z v 

3) To form 2- X x Z l9 instead of generating the two's complement of 
X x Z l9 we can use the ones complement with only an insignificant error. Since this 
error is much smaller, in comparison to the truncation error in the next step, we 
can neglect it. 

4) The product Y X Z X is truncated to A^+3 bits. We would also truncate 
the ones complement of X X Z V But since the inverted LSBs of X X Z X will be 
discarded, we can truncate X X Z X to 2M3+2 bits and then take its ones 
vcomplement The corresponding quantization errors, as discussed above, are: 

0< A r7 < 2~ 2NJ3 ~ 2 (7.46) 

0< A YiZi < 2' N ~ 3 (7.47) 
After including all the error sources due to simplification, we now analyze 
the effects of these errors on the final result <p 2 . Taking the errors into account, we 
can rewrite (7.43) as: 
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cp 2 * ((7j - A 7j )(1 /X 1 -e 1 ') + Ay iZi )(2 - (X l - A Xi )(1 / *i - ^) + A ^ ) (7.48) 
Expanding this product and neglecting terms whose magnitudes are insignificant, 
we have 

<p 2 «Y x IX x - Y x Xtf + ft / X, 2 )A ^ 

As mentioned in Section 7. 1 , Y X IX X is an approximation of tan 1 ^/^)- Its 
approximation error, defined in (7.8), is bounded by 

|AJ = W x IX$n\ < (0.75) 3 x 2*73. (7.50) 
Replacing Y X IX X by tan 1 (YJXJ + (YJXtf/3 in (7.49), we have 



-(l/JOA^^/^A^ + A^- 
The total error, € = <p 2 - tan -1 (lyXj), is 



(7.51) 



* = « / / ^i) 2 / 3 - (^) 2 + A Xi / X, + A ^ } ^ 5 ^ 

-(i/jqa^ + a^. 

All terms in the subtotal (Y l I X x f 1 3 - (Xfif + A ^ / X x + A ^ are non- 
negative. Thus, the lower bound of this subtotal is the minimum value of 
-(X&Y, which is -0.75 2 x X™ = -0.56 x according to (7.34). 

Correspondingly, its upper bound is the sum of the maximum values of the other 
three terms: (0.7573 + T 2 + T 2 ) x T WB = 0.68 x 2" 2M3 . 
Finally, we can obtain the total error bound as: 

|fi|< 0.75 x 2~ Nn x 0.68 x 2~ 2N/3 + 2~ N ~ 3 + 2~ N ~ 3 
1 1 (7.53) 
= 0.76 x 2~ N . 
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7. 3 Magnitude Calculation 



Once the angle of the vector (X 09 Y 0 ) is known, its magnitude can be 
obtained by multiplying X 0 by 1/cos <p, whose values can be pre-calculated and 
stored in a ROM, thereby requiring only a single multiplication. However, if we 
5 use all the available bits to index the ROM table, it is likely that a very large ROM 

will be needed. 

As we know from the preceding discussion, the coarse angle (p x is an 
approximation of (p. Similarly 1 /cos <p x approximates 1/cos <p. Therefore, we can 
expand the coarse-stage ROM 71 10 to include also the 1/cos q\ values. That is, 
1 0 for each input [7oZ 0 ], the coarse-stage ROM would output both <p x = tan' 1 flToZ 0 ]) 

and 1/cos <p v Since X 0 and Y 0 satisfy (7.2) and (7.3), the 1/cos Rvalue is within 

the interval [1,^2]. 

For many applications, the magnitude value is used only to adjust the 
scaling of some signal level, and high precision is not necessary. For applications 

1 5 where a higher precision is desired, we propose the following approach: 

First, instead of using the above-mentioned table of 1/cos q\ values, 
we pre-calculate and store in ROM the 1/cos <p M values, where <p M contains only 
the m MSBs of <p. Obviously a small table, one of comparable size to the l/cos(p x 
table, is needed. Then, we can look up the table entries for the two nearest values 

20 to <p, namely (p M and q>' M = <p M + 2~ m . Then abetter approximation of 1/cos <p 

can be obtained by interpolating between the table values 1/cos <p M and 1 / cos^ 
as 

- l/cosc? M , 

1/C0S#>* lC0S^ M + ; *(<P~<Pm)' (7-54) 

Let <p L = <p- <p M , Obviously, <p L simply contains the LSBs of (p. We can now 
25 rewrite (7.54) as 
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l/cos<p* l/cos^+ (l/cos^-l/cos^ M )x <p L x 2 m (7.55) 

which involves only a multiplication and a shift operation, in addition to two 
adders. 

7. 4 Converting Arbitrary Inputs 

In previous sections we have restricted the input values to lie within the 
bounds of (7.2) and (7.3). However, if the coordinates of (X 0 ,Y 0 ) do not satisfy 
that condition, we must map the given point to one whose coordinates do. Of 
course, the resulting angle must be modified accordingly. To do that, we replace 
X 0 and7 0 by their absolute values. This maps (X Q ,Y 0 ) into the first quadrant. Next, 
the larger of |X 0 | and |7 0 | is used as the denominator in (7.1) and the other as the 
numerator. This places the corresponding angle in the interval [0, tt/4]. We can 
now use the procedure discussed previously to obtain <p. Once we get <p, we can 
find the angle <J> that corresponds to the original coordinates from <p. First, if 
originally |X 0 | < \Y 0 \ we should map <p to [tx/4, tt/2] using <p' = n 1 2 - <p . 

Otherwise <j>* = <p . We then map this result to the original quadrant according to 

Table 7.1. 

Next, let us examine the affect of the above-mentioned mapping on 
the magnitude calculation. Since the negation and exchange of the original 
X 0 and 7 0 values do not change the magnitude, whose value is (X* + Y^) m , the 
result obtained using the X 0 and Y 0 values after the mapping needs no correction. 
However, if the input values were scaled to satisfy (7.3), we then need to scale the 
computed magnitude to the original scale of X 0 and Y 0 . 



Table 7.1 Converting Arbitrary Inputs 


Original coordinates 


Modification to the angle 


X Q <0,Y 0 >0 


<f> - n- $ 
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X Q <0,Y 0 <0 


0= 7C\ <f>' 


X 0 >0,Y 0 <0 


(f)-2n-<f)' 



7.5 Test Result 



We have verified our error bound estimation by a bit-level simulation of 
5 the rectangular-to-polar converter 7100. To test the core algorithm described in 

Section 7.2, we generated the pair of inputs X Q and 7 0 randomly within the range 
described by (7.2) and (7.3). This test was run repeatedly over many different 
values of X 0 and 7 0 , and the maximum error value was recorded. Choosing N = 
9 for this simulation, the error bound estimate according to (7.53) is 0.0015. Our 
10 test results yielded the error bounds [-0.00014, 0.00051], well within the 

calculated bound. 

7. 6 Conclusion 

An efficient rectangular-to-polar converter is described. The angle 
15 computation of a complex number is partitioned into coarse and fine 

computational stages. Very small arctangent and reciprocal tables are used to 
obtain a coarse angle. These tables should provide just enough precision such that 
the remaining fine angle is small enough to approximately equal its tangent value. 
Therefore the fine angle can be obtained without a look-up table. The 
20 computations are consolidated into a few small multipliers, given a precision 

requirement. While a low-precision magnitude can be obtained quite simply, a 
high-precision result can be achieved by combining the angle computation with the 
angle rotation processor 3900 of Section 5. 

The applications of this converter include the implementation of the 
25 converter 6106 in the symbol synchronizer 6100 and the synchronizer 6400. 

However, the converter is not limited to symbol synchronization. It also provides 
efficient implementation of computational tasks for many communication systems, 
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such as constant-amplitude FSK and PSK modems (Chen, A, and Yang, S., 
"Reduced complexity CORDIC demodulator implementation for D-AMPS and 
digital IF-sampled receiver," in Proc. Globecom 1998, vol 3 (1998), pp. 1491- 
1496; Boutin, N., IEEE Tram. Consumer Electron. 38:5-9 (1992)), DMT 
5 modems (Arivoli, T\, et al, "A single chip DMT modem for high-speed WLANs," 

in Proc. 1998 Custom Integrated Circuits Conf. (May 1998), pp. 9-11), as well 
as carrier synchronization (Andronico, M., et ah, "A new algorithm for fast 
synchronization in a burst mode PSK demodulator," in Proc. 1995 IEEE Int. 
Conf. Comm., vol 3 (June 1995), pp. 1641-1646; Fitz, M P., and Lindsey, W.C., 
10 IEEE Trans. Comm. 40: 1644-1653 (1992) where the computation of phase and 

magnitude from the rectangular coordinates is essential. 

8. Exemplary Computer System 

Embodiments of invention may be implemented using hardware, 
software or a combination thereof and may be implemented in a computer 

1 5 system or other processing system. In fact, in one embodiment, the invention 

is directed toward a software and/or hardware embodiment in a computer 
system. An example computer system 7702 is shown in FIG. 77. The 
computer system 7702 includes one or more processors, such as processor 
7704. The processor 7704 is connected to a communication bus 7706. The 

20 invention can be implemented in various software embodiments that can operate 

in this example computer system. After reading this description, it will become 
apparent to a person skilled in the relevant art how to implement the invention 
using other computer systems and/or computer architectures. 

Computer system 7702 also includes a main memory 7708, preferably 

25 a random access memory (RAM), and can also include a secondary memory or 

secondary storage 7710. The secondary memory 7710 can include, for 
example, a hard disk drive 7712 and a removable storage drive 7714, 
representing a floppy disk drive, a magnetic tape drive, an optical disk drive, 
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etc. The removable storage drive 7714 reads from and/or writes to a 
removable storage unit 7716 in a well known manner. Removable storage unit 
7716, represents a floppy disk, magnetic tape, optical disk, etc. which is read 
by and written to by removable storage drive 7714. As will be appreciated, 
5 the removable storage unit 7716 includes a computer usable storage medium 

having stored therein computer software and/or data. 

In alternative embodiments, secondary memory 7710 may include other 
similar means for allowing computer software and data to be loaded into 
computer system 7702. Such means can include, for example, a removable 

10 storage unit 7720 and an storage interface 7718. Examples of such can include 

a program cartridge and cartridge interface (such as that found in video game 
devices), a removable memory chip (such as an EPROM, or PROM) and 
associated socket, and other removable storage units 7720 and interfaces 7718 
which allow software and data to be transferred from the removable storage unit 

15 7720 to the computer system 7702. 

Computer system 7702 can also include a communications interface 
7722. Communications interface 7722 allows software and data to be 
transferred between computer system 7702 and external devices 7726. 
Examples of communications interface 7722 can include a modem, a network 

20 interface (such as an Ethernet card), a communications port, a PCMCIA slot 

and card, etc. Software and data transferred via communications interface 7722 
are in the form of signals, which can be electronic, electromagnetic, optical or 
other signals capable of being received by the communications interface 7722. 
These signals are provided to the communications interface 7722 via a channel 

25 7724. This channel 7724 can be implemented using wire or cable, fiber optics, 

a phone line, a cellular phone link, an RF link and other communications 
channels. 

Computer system 7702 may also include well known peripherals 7703 
including a display monitor, a keyboard, a printers and facsimile, and a 
30 pointing device such a computer mouse, track ball, etc. 
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Inthis document, the terms "computer program medium" and "computer 
usable medium" are used to generally refer to media such as the removable 
storage devices 7716 and 7718, a hard disk installed in hard disk drive 7712, 
semiconductor memory devices including RAM and ROM, and associated 
5 signals. These computer program products are means for providing software 

(including computer programs that embody the invention) and/or data to 
computer system 7702. 

Computer programs (also called computer control logic or computer 
program logic) are generally stored in main memory 7708 and/or secondary 

10 memory 7710 and executed therefrom. Computer programs can also be 

received via communications interface 7722. Such computer programs, when 
executed, enable the computer system 7702 to perform the features of the 
present invention as discussed herein. In particular, the computer programs, 
when executed, enable the processor 7704 to perform the features of the present 

15 invention. Accordingly, such computer programs represent controllers of the 

computer system 7702. 

In an embodiment where the invention is implement using software, the 
software may be stored in a computer program product and loaded into 
computer system 7702 using removable storage drive 7714, hard drive 7712 or 

20 communications interface 7722. The control logic (software), when executed 

by the processor 7704, causes the processor 7704 to perform the functions of 
the invention as described herein. 

In another embodiment, the invention is implemented primarily in 
hardware using, for example, hardware components such as application specific 

25 integrated circuits (ASICs), stand alone processors, and/or digital signal 

processors (DSPs). Implementation of the hardware state machine so as to 
perform the functions described herein will be apparent to persons skilled in the 
relevant art(s). In embodiments, the invention can exist as software operating 
on these hardware platforms. 
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In yet another embodiment, the invention is implemented using a 
combination of both hardware and software. 

9. Appendices 

The following Appendices are included. 

9. 1 Appendix A: Proof of the Zero ISI Condition. 

In Section 3.4 we have examined the condition for a real-valued function 
f(t) to have zero crossings at integer multiples of the sampling period, i.e., for 

At. 

satisfying (3.16). We have stated that f(t) satisfies (3.16) if and only if F(k), 
defined as samples of F (Q) in (3.8) (Fis the frequency response of/), satisfy 
(3.18). Here, -~ < k< <*> is an integer. We now provide the proof. 

Proof. First let us define a periodic, of period N, extension of/(0 as 

/c(0= f^nt-Nn). (A.1) 

«=-co 

Its Fourier transform consists of a sequence of impulses 

i7 c ( Q )=i^K Q "^4 (a - 2) 

£=-oo 

Next, consider an impulse chain. 

00 

C(0= £<5U-«) (A3) 

M=-oo 

whose Fourier transform is 

00 

C(D) = £ S(Q - Ink) . (A.4) 

£=-00 
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The convolution F e ®C can be expressed as 

F c (Q)®C(n) 

oo 

=F c (n)® ^ s ( n ~ 27m ) 

00 

= £ F c {fl-27tm) 

m=-oo 

Substituting (A.2) into (A. 5) yields: 
F c (Q)®C(n) 



= 1 [S F(k)5{n-2xm-—k 



-oo ^/fc=-a 



2/r 



-Z t *W*(°-£( t+J «")) 



&=-oo m=-oo 

CO f CO 



Therefore, we have the following relationships 

[1 t - Nm ? ra an integer 
(3.l6)o/ e (f) = Q ^^^^ 



/«('M')= I^('- Wm ) 



£=-co 



Jt=-oo 



(A.6) <» £ f £ Wm)]^n-^*j = _£ 4 

k=-«\/M=-ao ^ 

00 

^> £ F(k-Nm)=l 

m=-<x> 

<=> (3.18) 



This concludes the proof 
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9.2 Appendix B: Impulse Response of the Simplified 
Interpolators 



In Section2, after introducing a preliminary interpolation method, we have 
shown that we can trade one angle rotator for a multiplier by conceptually 
modifying the input samples, then by "correcting" the interpolated value obtained 
from the "modified" samples. A simpler implementation structure as well as better 
performance in interpolating most practical signals have been facilitated. We now 
derive the impulse response of this interpolation filter. As discussed in Section 2 ? 
the interpolated sample is computed as 



1 N/2-l 

y(»)=-^ £^ kf, -K M (b.i) 



k=-N/2+l 



where K is defined in (2.30), and 



Nil N/2 
m=-N12+l m=-NI2+l 



where k = 0, ...,N/2-l. 

Substituting (2.30) into (B.2), we have 



N/2 N/2 f j N/2 \ 

m=-N/2+\ m=-N/2+l^ iV n=- N/2+1 J 

N/2 N/2 f N/2 2 \ 

= lAriW?- I I m-W?\(-lYy(n) (B.3 

m=-N/2+\ n=-N/2+l^ m=-N /2+1 ' 

N/2 N/2 f N/2 9 ^ 

m=-N/2+\ m=-N/2+l^n=-N/2+l iV J 
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y{rri)W- N k " 



Replacing K and c k in (B. 1) by (2.30) and (B.3), respectively, we have 

I N/2-1 Nil r N/2 2 

= Jj E Z I »T7»^(-1)" 

iV k^-N/2+1 m=-N/2+\\ n=-N/2+l ^ 

2 Nn 
+ T7 Z (-1)">W)/* 

iV m=- N/2+1 
N/2 f i tf/2-1 

= Z y(4^ Z c*- 4 ) 

/»— JV72+1 V jfc=-jV/2+l 



2 ( Nil t» N/2-1 ^^ 



, I i I 

V»=-///2+l iV k=- N/2+1 j 



N/2 



m=- N/2+1 



where f(t), the impulse response of the simplified interpolation filter discussed in 
Section 2.6.1, is now defined as 

N/2-1 



1 N/2-1 

/(0=T7 Z 



N k=~N/2+\ 

2 



N/2-1 „ N/2-1 



N (rV\ Z jr Z »]?-^-(r+m) 

iy ^n=-N/2+l IW k=-N/2+l 



(B.5) 



for -7M < * < 1 - m, m = -M2+1,...,M2. Otherwise, = 0. 

The frequency response, of course, can be obtained by taking the Fourier 
transform of To modify the frequency response of flt\ we can multiply the 
c k coefficient in (B.3) by a value denoted by Fju(k) . In designing an optimal 

1 0 interpolation filter as discussed in Section 4, we search for the Fju(k) value that 

minimizes (4.4), i.e., i^(o>) most accurately approximates the desired 
response (4.3). 
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93 Appendix C: Fourier Transform of g(nT s -fi) 

Since, in Section 6, the ginT s ) are samples of the continuous-time pulse 
g(f), assuming, without loss of generality, that T s = 1, it is well-known (Freeman, 
H., Discrete-Time Systems, Wiley, New York, NY (1965)) that the Fourier 
transforms are related as 

G(e J ")= X G(a>+2nk) (C.l) 

£=-oo 

where G(e J6> ) and G(<y) are the Fourier transforms of g(n) and g(t\ 
respectively. Since g(t) is bandlimited, i.e., \G(co)\= 0 for \ co\> 7t^ we have 

G(e J& ) = G((dX -7i<a><7z. (C.2) 

Using the Fourier transform' s time- shifting property, the Fourier transform of g(t- 
H) is 

e jm G((D). (C.3) 

Since the g(n-\L) are samples of g(t-\L) , for the same reason as the above, their 
Fourier transforms are the same in the interval -7T<a)<7r, as in (C.2). Thus, 
according to (C.2) and (C.3) we have 

FT(g(n - //)) = e jmM G(e Jtu ), - n < co < n. (C.4) 

9. 4 Appendix D: Interpolation on Non-Center Intervals 

When we first discussed the interpolation problem in Section 2, we 
focused on interpolating between the two samples in the middle of the set of 
samples used to generate a synchronized sample. What is the impact on 
interpolation performance when we interpolate in an interval not at the center of 
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the samples being used? Figure 74shows such an example for N = 4, where the 
interpolation is performed between y(0) and.y(l) using y(-2\ y(-l), y(0) andj(l) 
(as opposed to usingX-1), y(P), yO-) andX^X as seen in Figure 2-1). 

Using the procedure described in Section 2, given TV samples y(n\ 
n = -JV+ 2,..., 1, we first compute the Fourier coefficients as 

Comparing (D. 1) to (2.9), their only difference is the range of the summation. As 
in Section 2, for a given offset 0 < \i < 1, the synchronized sample y(\i) can be 
computed as: 



JV/2-1 A 

(D.2) 



>{/<) = -jjrRe U + 2I c^ + W^, 



We can express X^) m terms of X w ) by substituting (D.2) into (D. 1), as 

1 1 f N/2 ~ l Ink ^ 
y(v)=^7 ZX») l+2l cos— -(//-w)+cos^r(//-n)j 

j (D.3) 
= 77 Z y(n)f(.M-n) 

n=-N+2 

where 



/(0 = 



1+ 2 > cos— — f + costt/- 1< i < - N+ 2 

h N (DA) 

v 0 otherwise 



is the impulse response of the corresponding interpolation filter. For A r = 4,f(f) is 
plotted in Figure 75A. Taking the Fourier transform of f(t\ we obtain the 
corresponding frequency response, which is shown in Figure 76A. 

Comparing the Figure 76A frequency response to Figure 7 A, both for 
N = 4, we can see that the interpolation performance degraded significantly, as 
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shown by the ripples in the passband and large sidelobes in the stopband in Figure 
76A, when the interpolation is not done in the center interval. 

However, using the optimization method discussed in Section 4, we can 
"reshape" the impulse response fit) such that the corresponding frequency 
response is a better approximation to the ideal interpolation frequency response. 

The impulse response of an optimized interpolation filter for a non-center 
interval is illustrated in Figure 75B. The corresponding frequency response is 
shown in Figure 76B, which is clearly better than Figure 76A, since it has less 
ripple in the passband and more attenuation in the stopband. 

Using samples y(-N + 2), y(-l\ and y(0), to interpolate obviously 
reduces the latency in generating synchronized samples, as compared to using 
samples y(-N/ 2+ 1), y(0), y(N/2\ since the interpolator does not have to 
wait until samples y(l), y(NI2\ become available. In applications where low 
latency takes a higher priority than interpolation accuracy, this approach will be 
useful. 

9.5 Appendix E 

The following documents are incorporated by reference in their entireties: 

1. Buchanan, K., etal, IEEE Per s. Comm. 4:8-13 (1997); 

2. Reimers, U., IEEE Comm. Magazine 36: 104-1 10 (1998); 

3. Cho, K., "A frequency-agile single-chip QAM modulator with 
beamforming diversity," Ph.D. dissertation, University of California, Los 
Angeles (1999); 

4. Oerder M., and Meyr, H., IEEE Trans. Comm. 35:605-612 (1988); 

5. Pollet T., and Peeters, M., IEEE Comm. Magazine 37:80-86 (1999); 
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10, Conclusion 

Example implementations of the methods, systems and components of the 
invention have been described herein. As noted elsewhere, these example 
implementations have been described for illustrative purposes only, and are not 
limiting. Other implementation embodiments are possible and covered by the 
invention, such as but not limited to software and software/hardware 
implementations of the systems and components of the invention. Such 
implementation embodiments will be apparent to persons skilled in the relevant 
art(s) based on the teachings contained herein. 

While various application embodiments of the present invention have been 
described above, it should be understood that they have been presented by way of 
example only, and not limitation. Thus, the breadth and scope of the present 
invention should not be limited by any of the above-described exemplary 
embodiments. 
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Wliat Is Claimed Is: 



1 1 . In a digital device, a method of generating an output signal that represents 

2 a polar angle c> for a complex input signal, the method comprising the steps of: 

3 (1) receiving the complex input signal having a real Xq component and 

4 an imaginary Y 0 component; 

5 (2) determining an angle fa that is a coarse approximation to the angle 

6 fa including the steps of 

7 (2a) determining a Z 0 value that approximates a [1/Xq] value, 

8 wherein [XJ is a truncated approximation of said Xq component, 

9 (2b) digitally multiplying said Z 0 value by Y 0 , resulting in a [Y 0 

10 Z 0 ] value, and 

1 1 (2c) determining an arctan of said [YoZ 0 ] value, resulting in said 

12 angle fa ; 

13 (3) determining a fine adjustment angle fa, including the steps of 

14 (3a) digitally computing an intermediate complex number, based 

15 on said [Yq/Xq] value, said intermediate complex number having a real Xi 

16 component and an imaginary Y x component, 

17 (3b) determining a Z x that approximates a [1/XJ value, wherein 

18 [X x ] is a truncated approximation of said X x component, 

19 (3 c) digitally multiplying said X : component by said [ZJ value 

20 to produce a Z X X X component, and digitally multiplying said Y x component by said 

21 [Zj] component to produce a Z X Y X component, 

22 (3 d) determining a one 5 s complement of said Z X X X component, 

23 and 

24 (3e) digitally multiplying said two' s complement of said Z^ 

25 component by said Z^ component, resulting in said fine adjustment angle <j> 2 ; 

26 and 
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27 (4) adding said fine adjustment angle <j) 2 to said angle 4>i to form said 

28 output signal that is data used by said digital device. 

1 2. The method of claim 1 , wherein step (2a) comprises the step of retrieving 

2 said [Z 0 ] value from a memory device. 

1 3 . The method of claim 1 , wherein step (2c) comprises the step of retrieving 

2 said angle (j^ value from a memory device. 

1 4. The method of claim 1, wherein step (3b) comprises the step of retrieving 

2 said [ZJ value from a memory device. 

1 5 . The method of claim 1 , wherein step (2a) comprises the step of retrieving 

2 said [Z 0 ] value from a memory device, and wherein step (3 b) comprises the step 

3 of retrieving said [ZJ value from said memory device. 

1 6. The method of claim 1, wherein said step (3 a) comprises the step of 

2 multiplying said Xq component and said Y 0 component by a tan 

1 7. The method of claim 1, wherein said step (3a) comprises the step of 

2 multiplying said component and said Y 0 component by said [Z 0 Y 0 ] value. 

1 8. An apparatus that generates an output signal that represents a polar angle 

2 <f> for a complex input signal having a Xq component and a Y 0 component, 

3 comprising: 

4 a first memory that stores one or more Z 0 values indexed by [Xq], wherein 

5 [Xq] is a bit truncated version of said Xq value, wherein said Z 0 value is 

6 approximately 1/[Xq]; 
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7 a multiplier that multiplies said Z 0 value by the Y 0 component, resulting in 

8 a [Z 0 Y 0 ] value; 

9 a second memory that stores one or more (J>! angles, wherein said (j^ angle 

10 is approximately an arctan of [Z 0 Y 0 ] ; 

1 1 a digital circuit that multiples said Xq component and said Y 0 component 

12 by said 

13 [Z 0 Y 0 ] value, resulting in an intermediate complex number having an X x 

14 component and a Y x component; 

15 a fine angle computation stage that determines an angle (f> 2 based on Y l /X 1 ; 

16 and 

17 an adder that adds § x + (j> 2 to produce said angle <|) to form the output 

1 8 signal that is data processed by said apparatus. 

1 9. The apparatus of claim 8, wherein said fine angle computation stage 

2 includes: 

3 a set of multipliers that multiply said X x component and said Y x component 

4 by a Zj value resulting in a X X Z X component and a Y X Z X component, wherein Z x 

5 is a bit truncated version of 1/[X X ], and wherein [XJ is a bit truncated version of 

6 X v 

1 10. The apparatus of claim 9, wherein said Z x value is retrieved from said first 

2 memory based on said [XJ value. 

1 11. The apparatus of claim 9, wherein said fine angle computation stage 

2 further includes: 

3 a means for implementing a one's complement of said X^; and 

4 a second multiplier for multiplying said one' s complement of X X Z X by said 

5 Y X Z X component. 
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1 12. The apparatus of claim 9, wherein said fine angle computation stage 

2 further includes: 

3 a means for implementing a two's complement of said X^; and 

4 a second multiplier for multiplying said two' s complement of X X Z X by said 

5 Y X Z X component. 

1 13. The apparatus of claim 8, further comprising: 

2 a scaling shifter, coupled to said digital circuit, wherein said scaling shifter 

3 scales said X t component in accordance with reciprocal values that are stored in 

4 said first memory. 

1 14. The apparatus of claim 13, wherein said scaling shifter also scales said Y x 

2 component similar to said scaling of said X! component. 

1 15. The apparatus of claim 8, wherein said digital circuit is a butterfly circuit 

2 that is coupled to an output of said multiplier. 

1 16. In a digital device, a method of generating an output signal that represents 

2 a polar angle $ for a complex input signal, the method comprising the steps of: 

3 ( 1 ) receiving the complex input signal having a real Xq component and 

4 an imaginary Y 0 component; 

5 (2) retrieving a Z 0 value from a first memory, wherein Z 0 is a bit 

6 truncated approximation for 1/Xo; 

7 (3) digitally multiplying said Z 0 value by said Y 0 component, resulting 

8 in a [YqZq] value; 

9 (4) retrieving an angle fa from a second memory, wherein fa is based 
10 on an arctan of said [YoZ 0 ] value; 
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1 1 (5) digitally rotating said input complex signal in a complex plane by 

1 2 said angle fa to produce an intermediate complex signal having an X! component 

13 and a Y l component; 

14 (6) digitally computing an angle (J> 2 that is an approximation to an 

15 arctan Yx/X^ and 

1 6 (7) adding said angle 4> 2 to said angle fa to form the output signal that 

1 7 is data used by said digital device. 

1 17. The method of claim 16, wherein said step (6) comprises step of: 

2 (a) retrieving a Z x value from said first memory, wherein said Z x value 

3 is a bit truncated approximation of 1/X X ; and 

4 (b) digitally multiplying said X x component by said Z x value to produce 

5 a Z X X X component, and digitally multiplying said Y x component by said Z x value 

6 to produce a component; 

7 (c) determining a one's complement of said Z X X X component; and 

8 (d) multiplying said one' s complement of said Z X X X component by said 

9 Z X Y X component. 

1 18. The method of claim 1 6, wherein step (5) comprises the step of multiplying 

' 2 said input complex signal by a tan fa, 

1 19. The method of claim 1 6, wherein step (5) comprises the step of multiplying 

2 said input complex signal by said [YqZq] value. 

1 20. In a digital device, a method of symbol timing synchronization, the method 

2 comprising the steps of: 

3 (1) receiving complex data samples of one or more symbols; 

4 (2) correlating said complex data samples with a complex conjugate 

5 of a preamble data set, resulting in correlated complex data samples, each 
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6 correlated complex data sample represented by a real sample and an imaginary 

7 sample; 

8 (3) selecting between said real samples and said imaginary samples, 

9 resulting in a set of selected samples; 

10 (4) generating a complex number based on said set of selected 

1 1 samples; and 

12 (5) determining an angle in a complex plane associated with said 

13 complex number, whereby said angle represents symbol synchronization for the 

14 communications device. 

1 21 . The method of claim 20, further comprising the step of: 

2 (5) multiplying said angle by nil to determine an offset \i that indicates 

3 symbol synchronization. 

1 22. The method of claim 20, wherein step (2) comprises the step of multiplying 

2 said received complex data samples with said preamble data set. 

1 23. The method of claim 20, wherein said step (3) comprises the step of 

2 selecting the larger of said real samples and said imaginary samples. 

1 24. The method of claim 20, wherein step (4) comprises the steps of: 

2 (a) determining a Fourier transform based on set of selected data 

3 samples; and 

4 (b) evaluating said Fourier transform at 7t/2. 

1 25. The method of claim 20, wherein step (4) comprises the steps of: 

2 (a) determining which of said selected data samples has the largest 

3 magnitude; 
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4 (a) selecting n adjacent samples from the selected data samples that 

5 includes said largest magnitude sample; 

1 (c) determining a Fourier transform of said n adjacent data samples; 

2 and 

3 (d) evaluating said Fourier transform at resulting in said complex 

number. 

1 

2 26. The method of claim 20, wherein said complex number is in a rectangular 

3 format, and wherein step (5) comprises the step of; 

4 converting said complex number to polar format having a magnitude and 

5 said angle. 

1 27. The method of claim 20, where step (4) comprises the steps of: 

2 (a) determining which of said selected data samples has the largest 

3 magnitude; 

4 (a) selecting 4 adjacent samples from the selected data samples, 

5 represented by r(-l), r(0), r(l), and r(2), wherein said largest magnitude data 

6 sample is one of r(0) and r(l); 

7 (c) determining a Fourier transform of said 4 adjacent data samples; 

8 and 

9 (d) evaluating said Fourier transform at iu/2, resulting in said complex 
10 number. 

1 28. The method of claim 27, wherein step (c) comprises the steps of: 

2 (I) determining r(0) - r(2), to produce in a real part of said complex 

3 number; and 

4 (ii) determining r(-l) - r(l), to produce in an imaginary part of said 

5 complex number. 
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1 29. In a digital device, a method of carrier recovery, the method comprising 

2 the steps of: 

3 (1) receiving complex data samples of one or more symbols; 

4 (2) correlating said complex data samples with a complex conjugate 

5 of a preamble data set, resulting in correlated complex data samples; 

6 (3) selecting one of said correlated complex data samples; 

7 (4) determining an angle in said complex plane based on said selected 

8 correlated complex data sample, whereby said angle represents a carrier phase 

9 offset in the digital device. 

1 30. The method of claim 29, wherein step (3) comprises the step of selecting 

2 a largest of said correlated complex data samples. 

1 31. The method of claim 29, wherein said complex number is in a rectangular 

2 format, wherein step (4) comprises the step of converting said complex number 

3 to a polar format having a magnitude and said angle. 

1 32. In a digital device for generating an output signal that represents a polar 

2 angle (p for a complex input digital signal, a method of converting Cartesian data 

3 of said input digital signal to polar angle data of said output signal, comprising the 

4 steps of: 

5 (1) receiving the input digital signal; and 

6 (2) determining at least two subangles, the combination of which 

7 subangles represents the polar angle <p. 

1 33. The method of claim 32, wherein step (2) comprises the step of: (a) 

2 determining at least one subangle by using a memory device. 

1 34. The method of claim 32, wherein said step (2) comprises the step of: 
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2 (a) determining at least one subangle by using a trigonometric Sanction 

3 of a subangle as an approximation for the subangle. 

1 35. The method of claim 34, wherein said step (a) comprises of the step of: 

2 (i) determining said trigonometric function using a previously 

3 determined subangle and said Cartesian data of said input digital signal. 

1 36. The method of claim 35, wherein said step (i) comprises the step of 

2 detennining said trigonometric function by rotating said Cartesian data of said 

3 input digital signal by said previously determined subangle. 
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Apparatus and Method for Rectangular-to-Polar Conversion 

Abstract 

A rectangular-to-polar-converter receives a complex input signal (having 
Xq and Y 0 components) and determines an angle 4> that represents the position of 
the complex signal in the complex plane. The rectangular-to polar-converter 
determines a coarse angle (p x and a fine angle <p 2 , where (p= (p x + <p 2 . The coarse 
angle <p x is obtained using a small arctangent table and a reciprocal table. These 
tables provide just enough precision such that the remaining fine angle <p 2 is small 
enough to approximately equal its tangent value. Therefore the fine angle (p 2 can 
be obtained without a look-up table, and the fine angle computations are 
consolidated into a few small multipliers, given a precision requirement. 
Applications of the rectangular-to-polar converter include symbol and carrier 
synchronization, including symbol synchronization for bursty transmissions of 
packet data systems. Other applications include any application requiring the 
rectangular-to-polar conversion of a complex input signal. 

A280-61.wpd 
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5 



BD4 



BD3 



PP* 
4 



BD2 



PP* 
3 



pp. 
2 



Partial Product Summing Tree 



* PP - Partial Product 

* BD - Booth Decode 



Product 



innr 



BD 1 



PP 1 
1 



mn 



Multiplicand (M bits) 



Booth Multiplier 



MULTIPLIER (10 bit) 
b9 b8 b7 b6 b5 b4 b3 b2 b1 bO 0 

V W 



Decode Decode Decode Decode Decode 
5 4 3 2 1 



Showing how the multiplier bits feed into the decode 
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Original Booth Table 
b2 b1 bO PP 
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Sloo 
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0 
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0*A 


0 


0 


1 


1*A 
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1 
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0 


1 
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2*A 
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-2*A 
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0 
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-1*A 


1 
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-1*A 


1 


1 


1 


0*A 



Negating Booth Table 
b2 b1 bO PP 
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D*A 
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-1*A 
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-1*A 
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-2*A 
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2*A 
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1*A 
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1*A 
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0*A 
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Unique word 
Preamble 



User data field 



fjg g$ : Common packet format. 
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jlKf c t 



-j2Kf c t + jQ 



fxG : The simplified system model. 



Data 
in 



QO~L 



Preamble 
correlator 
as in (6.1) 



Yxx 



(2) 



) ED 



r xx (-Tjl 



-4 



Rec.-Pol. 
Convert. 
(Chap. 7) 



n 



fx<^ C I 2 Preliminary symbol-timing estimation structure. 



t 



. — — ■■ ■ 

Receive a set of complex data samples 
generated from sampling incoming symbols 



Correlate the complex data samples with 
a complex conjugate of a preamble data set 



Determine the Fourier transform of the 
correlated complex data samples 



^ 

Evaluate the Fourier transform at tc/2, generating 
a complex signal representing a complex number 



Determine an angle in a complex plane associated with 
the complex number, where the angle represents 
synchronization between the data samples and the incoming symbols 



Scale the angle by to determine the synchronization offset 



6X to 




F XG> . 61) Bias due to truncation. 



or 



VD 




3's §.i 



Receive a set of complex data samples 
generated from sampling incoming symbols 



±L 

Correlate the data samples with 
a complex conjugate of a preamble data set 



- it — 

Select either the set of real correlated samples 
or the set of imaginary correlated samples based on magnitude 



Determine the Fourier transform of the 
selected correlated samples 



i 



6^ 



Evaluate the Fourier transform at 7t/2, generating 
a complex signal representing a complex number 



Determine an angle in the complex plane associated with 
the complex number of step 6510, where the angle represents 
synchronization between the data samples and the incoming symbols 



Scale the angle by nil to determine the synchronization offset 



Select the largest correlator complex output 



Determine an angle in the complex plane associated with ] 
the largest correlator complex output of step 6516, where the angle represents 

carrier phase offset 6 



457* 



0 2 4 6 8 10 12 




Figure &ll Timing jitter variance, a = 0.L 




E / N 

s 0 

Figure C&'* Phase jitter variance, a = O.L 




p TG nOA* Usin § Newton-Raphson iteration to find 




pT6 "7 0/? ' One Newton-Raphson iteration. 



Receive the input complex signal Y 0 



± 

Retrieve Z 0 = l/[Xo] from reciprocal ROM 
based on [X,,] 



v 

^0 x L^o^o] 



Retrieve <f> t from arctan ROM based on [Z 0 Y 0 ] 



\ Multiply Xo Y 0 by [Z 0 Y 0 ], 
j to generate X l5 Y x 



Scale X t , Y x so that Xj is compatible 
with reciprocal ROM 



13/7- 



Retrieve Z x from reciprocal ROM based on X x 



V_ . ■ 

Determine ^ using Newton-Raphston to estimate Z l Y i 
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<P=<Pl + <f>2 
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. Interpolation in a non-center interval. 
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\ original 
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\ afteroptirnization 
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• Impulse responses of the non-center-interval 
interpolation filter /[ > before and fa after optimization. 
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BUS 





Devils 



fxC. 7 ^ 



Input samples 
(at data rate r) 




Add/ 

Subtract 

Module 



Angle- 
Rotation 
Module 

(fixed \i) 
ll=1/L 



Angle- 

Rotation 

Module 

(fixed jx) 
^ = 2/L 



r 



Angle- 
Rotation 
Module 

(fixed |i) 
H = (L-1VL 



Multiplexer 



Output samples 
(at data rate Lr) 



D a ta Rate Expansion Circuit. 



